Re: 3x faster python reader

Uri Laserson Tue, 30 Apr 2013 00:51:16 -0700

Hi Miki,

Yes, I followed your model in remaking the Avro reader, but I performed the
schema resolution so that you could still specify separate writer/reader
schemas.  Your code is still 2.5x faster than mine when using the C
extensions.


I personally find the current API somewhat confusing, so I'd be into
changing it.

Uri


On Mon, Apr 29, 2013 at 2:32 PM, Miki Tebeka <miki.teb...@gmail.com> wrote:

> Hi,
>
> I did the same for fastavro <https://bitbucket.org/tebeka/fastavro>. I
> found changing the current code while keeping the same API very hard.
>
> Another option we can take is leave the current code as version 1 add the
> new code either as new module under avro or as avro2.
>
> All the best,
> --
> Miki
>
>
> On Sun, Apr 28, 2013 at 10:24 PM, Uri Laserson <laser...@cloudera.com
> >wrote:
>
> > Hi all,
> >
> > I rewrote some of the python code to read avro files.  I was able to
> > achieve a ~3x speedup over the current impl, and can probably do better
> if
> > it was cleaned up more.  The main changes are:
> > * Eliminated the object-oriented nature of the reader.  It's just
> functions
> > now.  Presumably this can be changed back, but it didn't really seem like
> > there was any reason for it.
> > * Given a reader and writer schema, it precomputes as much helpful info
> as
> > it can upfront and caches this in a dictionary that the read functions
> use
> > * The code is compiled with Cython for speedup.
> >
> > How can this be used to improve the current python api?  Let me know how
> I
> > can be helpful...
> >
> > Uri
> >
> > --
> > Uri Laserson, PhD
> > Data Scientist, Cloudera
> > Twitter/GitHub: @laserson
> > +1 617 910 0447
> > laser...@cloudera.com
> >
>



-- 
Uri Laserson, PhD
Data Scientist, Cloudera
Twitter/GitHub: @laserson
+1 617 910 0447
laser...@cloudera.com

Re: 3x faster python reader

Reply via email to