Hi all, Thank you for the feedback. Looking at the responses, it seems like there is a consensus to move forward with fastavro as the default implementation on Python 3.
There are 2 questions left however: - Should fastavro also become the default implementation on Python 2? This is a trade-off between having a consistent API across Python versions, or keeping the current behavior on Python 2. - Should we keep the avro-python3 dependency? With the proposed solution, we could remove the avro-python3 dependency, but it might have to be re-added if we want to support Avro again on Python 3 in a future version. Kind regards, Robbe [image: https://ml6.eu] <https://ml6.eu/> * Robbe Sneyders* ML6 Gent <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl> M: +32 474 71 31 08 On Thu, 28 Mar 2019 at 18:28, Ahmet Altay <[email protected]> wrote: > Hi Ismaël, > > It is great to hear that Avro is planning to make a release soon. > > To answer your concerns, fastavro has a set of tests using regular avro > files[1] and it also has a large set of users (with 675470 package > downloads). This is in addition to it being a py2 & py3 compatible package > and offering ~7x performance improvements [2]. Another data point, we were > testing fastavro for a while behind an experimental flag and have not seen > issues related compatibility. > > pyavro-rs sounds promising however I could not find a released version of > it on pypi. The source code does not look like being maintained either with > last commit on Jul 2, 2018. (for comparison last change on fastavro was on > Mar 19, 2019). > > I think given the state of things, it makes sense to switch to fastavro as > the default implementation to unblock python 3 changes. When avro offers a > similar level of performance we could switch back without any visible user > impact. > > Ahmet > > [1] https://github.com/fastavro/fastavro/tree/master/tests > [2] https://pypi.org/project/fastavro/ > > On Thu, Mar 28, 2019 at 7:53 AM Ismaël Mejía <[email protected]> wrote: > >> Hello, >> >> The problem of switching implementations is the risk of losing >> interoperability, and this is more important than performance. Does >> fastavro have tests that guarantee that it is fully compatible with >> Avro’s Java version? (given that it is the de-facto implementation >> used everywhere). >> >> If performance is a more important criteria maybe it is worth to check >> at pyavro-rs [1], you can take a look at its performance in the great >> talk of last year [2]. >> >> I have been involved actively in the Avro community in the last months >> and I am now a committer there. Also Dan Kulp who has done multiple >> contributions in Beam is now a PMC member too. We are at this point >> working hard to get the next release of Avro out, actually the branch >> cut of Avro 1.9.0 is happening this week, and we plan to improve the >> release cadence. Please understand that the issue with Avro is that it >> is a really specific and ‘old‘ project (~10 years) so part of the >> active moved to other areas because it is stable, but we are still >> there working on it and we are eager to improve it for everyone’s >> needs (and of course Beam needs). >> >> I know that Python 3’s Avro implementation is still lacking and could >> be improved (views expressed here are clearly valid), but maybe this >> is a chance to contribute there too. Remember Apache projects are a >> family and we have a history of cross colaboration with other >> communities e.g. Flink, Calcite so why not give it a chance to Avro >> too. >> >> Regards, >> Ismaël >> >> [1] https://github.com/flavray/pyavro-rs >> [2] >> https://ep2018.europython.eu/media/conference/slides/how-to-write-rust-instead-of-c-and-get-away-with-it-yes-its-a-python-talk.pdf >> >> On Wed, Mar 27, 2019 at 11:42 PM Chamikara Jayalath >> <[email protected]> wrote: >> > >> > +1 for making use_fastavro the default for Python3. I don't see any >> significant drawbacks in doing this from Beam's point of view. One concern >> is whether avro and fastavro can safely co-exist in the same environment so >> that Beam continues to work for users who already have avro library >> installed. >> > >> > Note that there are two use_fastavro flags (confusingly enough). >> > (1) for avro file source [1] >> > (2) an experiment flag [2] with the same name that makes Dataflow >> runner use fastavro library for reading/writing intermediate files and for >> reading Avro files exported by BigQuery. >> > >> > I can help with the latter. >> > >> > [1] >> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/avroio.py#L81 >> > [2] >> https://lists.apache.org/thread.html/94bd362a3a041654e6ef9003fb3fa797e25274fdb4766065481a0796@%3Cuser.beam.apache.org%3E >> > >> > Thanks, >> > Cham >> > >> > On Wed, Mar 27, 2019 at 3:27 PM Valentyn Tymofieiev < >> [email protected]> wrote: >> >> >> >> Thanks, Robbe and Frederik, for raising this. >> >> >> >> Over the course of making Beam Python 3 compatible this is at least >> the second time [1] we have to deal with an error in avro-python3 package. >> The release cadence of Apache Avro (1 release a year) >> >> is concerning to me [2]. Even if we have a new release with Python 3 >> fixes soon, as Beam users start use Beam more actively on Python 3, we may >> encounter more issues in avro-python3. If this happens, Beam will have to >> monkey-patch its way around the avro-python3 issues, because waiting for >> next Avro release may not be practical. >> >> >> >> So, I agree that it is be a good time to start transitioning off of >> avro/avro-python3 dependency, given that fastavro is known to be a faster >> alternative [3], and is released monthly[4] >> >> >> >> There are couple of ways to make this transition depending on how >> careful we want to be. We should: >> >> >> >> 1. Remove the dependency on avro in the current codepath whenever >> fastavro is used, as you propose. >> >> 2. Remove Beam dependency on avro-python3 now, OR, if we want to be >> safer, set use_fastavro=True a default option on Python 3, but keep the >> dependency on avro-python3, and keep that codepath, even though it may not >> work right now on Py3, but might work after next Avro release. >> >> 3. set use_fastavro=True a default option on Python 2. >> >> 4. Remove Beam dependency on avro and avro-python3 after several >> releases. >> >> >> >> Adding +Chamikara Jayalath and +Udi Meiri who have been working on >> Beam IOs may have some thoughts here. Do you think that it is safe to make >> use_fastavro=True a default option for both Py2 and Py3 now? If we make >> use_fastavro a default option on Py3, do you think there is a benefit to >> still keep the Avro codepath on Py3, or we can remove it? >> >> >> >> Thanks, >> >> Valentyn >> >> >> >> [1] https://github.com/apache/avro/pull/436 >> >> [2] https://avro.apache.org/releases.html >> >> [3] >> https://medium.com/@abrarsheikh/benchmarking-avro-and-fastavro-using-pytest-benchmark-tox-and-matplotlib-bd7a83964453 >> >> [4] https://pypi.org/project/fastavro/#history >> >> >> >> On Wed, Mar 27, 2019 at 10:49 AM Robbe Sneyders <[email protected]> >> wrote: >> >>> >> >>> Hi all, >> >>> >> >>> We're looking at fixing avroio on Python 3, which still fails due to >> a non-picklable schema class in Avro [1]. This is fixed when using the >> latest Avro master, but the last release dates back to May 2017. >> >>> >> >>> Fastavro does not have the same problem, but is currently also >> failing due to a dependency of avroio on Avro for schema parsing. >> >>> >> >>> We would therefore propose to (temporarily?) deprecate Avro on Python >> 3, and implement a pure fastavro solution instead. +Frederik Bode already >> submitted a PR for this [2]. >> >>> >> >>> Use of fastavro is currently activated with the `use_fastavro` flag, >> which defaults to False. Since this flag would not make sense anymore on >> Python 3, we would like to switch the default value to True. The >> documentation already mentions that this will probably become the default >> on the long term, but this change would also impact Python 2. Is this a >> problem? >> >>> >> >>> Also, looking at the performance gain of fastavro, is there any >> reason to not deprecate Avro in favor of fastavro on Python 3 indefinitely? >> >>> >> >>> [1] https://issues.apache.org/jira/browse/BEAM-6522#comment-16784499 >> >>> [2] https://github.com/apache/beam/pull/8130 >> >>> >> >>> Kind regards, >> >>> Robbe >> >
