Hi all, I rewrote some of the python code to read avro files. I was able to achieve a ~3x speedup over the current impl, and can probably do better if it was cleaned up more. The main changes are: * Eliminated the object-oriented nature of the reader. It's just functions now. Presumably this can be changed back, but it didn't really seem like there was any reason for it. * Given a reader and writer schema, it precomputes as much helpful info as it can upfront and caches this in a dictionary that the read functions use * The code is compiled with Cython for speedup.
How can this be used to improve the current python api? Let me know how I can be helpful... Uri -- Uri Laserson, PhD Data Scientist, Cloudera Twitter/GitHub: @laserson +1 617 910 0447 laser...@cloudera.com