Hi Ismaël,

It is great to hear that Avro is planning to make a release soon.

To answer your concerns, fastavro has a set of tests using regular avro
files[1] and it also has a large set of users (with 675470 package
downloads). This is in addition to it being a py2 & py3 compatible package
and offering ~7x performance improvements [2]. Another data point, we were
testing fastavro for a while behind an experimental flag and have not seen
issues related compatibility.

pyavro-rs sounds promising however I could not find a released version of
it on pypi. The source code does not look like being maintained either with
last commit on Jul 2, 2018. (for comparison last change on fastavro was on
Mar 19, 2019).

I think given the state of things, it makes sense to switch to fastavro as
the default implementation to unblock python 3 changes. When avro offers a
similar level of performance we could switch back without any visible user
impact.

Ahmet

[1] https://github.com/fastavro/fastavro/tree/master/tests
[2] https://pypi.org/project/fastavro/

On Thu, Mar 28, 2019 at 7:53 AM Ismaël Mejía <ieme...@gmail.com> wrote:

> Hello,
>
> The problem of switching implementations is the risk of losing
> interoperability, and this is more important than performance. Does
> fastavro have tests that guarantee that it is fully compatible with
> Avro’s Java version? (given that it is the de-facto implementation
> used everywhere).
>
> If performance is a more important criteria maybe it is worth to check
> at pyavro-rs [1], you can take a look at its performance in the great
> talk of last year [2].
>
> I have been involved actively in the Avro community in the last months
> and I am now a committer there. Also Dan Kulp who has done multiple
> contributions in Beam is now a PMC member too. We are at this point
> working hard to get the next release of Avro out, actually the branch
> cut of Avro 1.9.0 is happening this week, and we plan to improve the
> release cadence. Please understand that the issue with Avro is that it
> is a really specific and ‘old‘ project (~10 years) so part of the
> active moved to other areas because it is stable, but we are still
> there working on it and we are eager to improve it for everyone’s
> needs (and of course Beam needs).
>
> I know that Python 3’s Avro implementation is still lacking and could
> be improved (views expressed here are clearly valid), but maybe this
> is a chance to contribute there too. Remember Apache projects are a
> family and we have a history of cross colaboration with other
> communities e.g. Flink, Calcite so why not give it a chance to Avro
> too.
>
> Regards,
> Ismaël
>
> [1] https://github.com/flavray/pyavro-rs
> [2]
> https://ep2018.europython.eu/media/conference/slides/how-to-write-rust-instead-of-c-and-get-away-with-it-yes-its-a-python-talk.pdf
>
> On Wed, Mar 27, 2019 at 11:42 PM Chamikara Jayalath
> <chamik...@google.com> wrote:
> >
> > +1 for making use_fastavro the default for Python3. I don't see any
> significant drawbacks in doing this from Beam's point of view. One concern
> is whether avro and fastavro can safely co-exist in the same environment so
> that Beam continues to work for users who already have avro library
> installed.
> >
> > Note that there are two use_fastavro flags (confusingly enough).
> > (1) for avro file source [1]
> > (2) an experiment flag [2] with the same name that makes Dataflow runner
> use fastavro library for reading/writing intermediate files and for reading
> Avro files exported by BigQuery.
> >
> > I can help with the latter.
> >
> > [1]
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/avroio.py#L81
> > [2]
> https://lists.apache.org/thread.html/94bd362a3a041654e6ef9003fb3fa797e25274fdb4766065481a0796@%3Cuser.beam.apache.org%3E
> >
> > Thanks,
> > Cham
> >
> > On Wed, Mar 27, 2019 at 3:27 PM Valentyn Tymofieiev <valen...@google.com>
> wrote:
> >>
> >> Thanks, Robbe and Frederik, for raising this.
> >>
> >> Over the course of making Beam Python 3 compatible this is at least the
> second time [1] we have to deal with an error in avro-python3 package. The
> release cadence of Apache Avro (1 release a year)
> >> is concerning to me [2]. Even if we have a new release with Python 3
> fixes soon, as Beam users start use Beam more actively on Python 3, we may
> encounter more issues in avro-python3. If this happens, Beam will have to
> monkey-patch its way around the avro-python3 issues, because waiting for
> next Avro release may not be practical.
> >>
> >> So, I agree that it is be a good time to start transitioning off of
> avro/avro-python3 dependency, given that fastavro is known to be a faster
> alternative [3], and is released monthly[4]
> >>
> >> There are couple of ways to make this transition depending on how
> careful we want to be. We should:
> >>
> >> 1. Remove the dependency on avro in the current codepath whenever
> fastavro is used, as you propose.
> >> 2. Remove Beam dependency on avro-python3 now,  OR,  if we want to be
> safer,  set use_fastavro=True a default option on Python 3, but keep the
> dependency on avro-python3, and keep that codepath, even though it may not
> work right now on Py3, but might work after next Avro release.
> >> 3. set use_fastavro=True a default option on Python 2.
> >> 4. Remove Beam dependency on avro and avro-python3 after several
> releases.
> >>
> >> Adding +Chamikara Jayalath and +Udi Meiri who have been working on Beam
> IOs may have some thoughts here. Do you think that it is safe to make
> use_fastavro=True a default option for both Py2 and Py3 now? If we make
> use_fastavro a default option on Py3, do you think there is a benefit to
> still keep the Avro codepath on Py3, or we can remove it?
> >>
> >> Thanks,
> >> Valentyn
> >>
> >> [1] https://github.com/apache/avro/pull/436
> >> [2] https://avro.apache.org/releases.html
> >> [3]
> https://medium.com/@abrarsheikh/benchmarking-avro-and-fastavro-using-pytest-benchmark-tox-and-matplotlib-bd7a83964453
> >> [4] https://pypi.org/project/fastavro/#history
> >>
> >> On Wed, Mar 27, 2019 at 10:49 AM Robbe Sneyders <robbe.sneyd...@ml6.eu>
> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> We're looking at fixing avroio on Python 3, which still fails due to a
> non-picklable schema class in Avro [1]. This is fixed when using the latest
> Avro master, but the last release dates back to May 2017.
> >>>
> >>> Fastavro does not have the same problem, but is currently also failing
> due to a dependency of avroio on Avro for schema parsing.
> >>>
> >>> We would therefore propose to (temporarily?) deprecate Avro on Python
> 3, and implement a pure fastavro solution instead. +Frederik Bode  already
> submitted a PR for this [2].
> >>>
> >>> Use of fastavro is currently activated with the `use_fastavro` flag,
> which defaults to False. Since this flag would not make sense anymore on
> Python 3, we would like to switch the default value to True. The
> documentation already mentions that this will probably become the default
> on the long term, but this change would also impact Python 2. Is this a
> problem?
> >>>
> >>> Also, looking at the performance gain of fastavro, is there any reason
> to not deprecate Avro in favor of fastavro on Python 3 indefinitely?
> >>>
> >>> [1] https://issues.apache.org/jira/browse/BEAM-6522#comment-16784499
> >>> [2] https://github.com/apache/beam/pull/8130
> >>>
> >>> Kind regards,
> >>> Robbe
>

Reply via email to