Hi Miki, Yes, I followed your model in remaking the Avro reader, but I performed the schema resolution so that you could still specify separate writer/reader schemas. Your code is still 2.5x faster than mine when using the C extensions.
I personally find the current API somewhat confusing, so I'd be into changing it. Uri On Mon, Apr 29, 2013 at 2:32 PM, Miki Tebeka <miki.teb...@gmail.com> wrote: > Hi, > > I did the same for fastavro <https://bitbucket.org/tebeka/fastavro>. I > found changing the current code while keeping the same API very hard. > > Another option we can take is leave the current code as version 1 add the > new code either as new module under avro or as avro2. > > All the best, > -- > Miki > > > On Sun, Apr 28, 2013 at 10:24 PM, Uri Laserson <laser...@cloudera.com > >wrote: > > > Hi all, > > > > I rewrote some of the python code to read avro files. I was able to > > achieve a ~3x speedup over the current impl, and can probably do better > if > > it was cleaned up more. The main changes are: > > * Eliminated the object-oriented nature of the reader. It's just > functions > > now. Presumably this can be changed back, but it didn't really seem like > > there was any reason for it. > > * Given a reader and writer schema, it precomputes as much helpful info > as > > it can upfront and caches this in a dictionary that the read functions > use > > * The code is compiled with Cython for speedup. > > > > How can this be used to improve the current python api? Let me know how > I > > can be helpful... > > > > Uri > > > > -- > > Uri Laserson, PhD > > Data Scientist, Cloudera > > Twitter/GitHub: @laserson > > +1 617 910 0447 > > laser...@cloudera.com > > > -- Uri Laserson, PhD Data Scientist, Cloudera Twitter/GitHub: @laserson +1 617 910 0447 laser...@cloudera.com