Hi,
I just started trying out Parquet, and ran into a performance issue. I
was using the Avro support to try working with a test schema, using
the 'standalone' approach from here:
http://blog.cloudera.com/blog/2014/05/how-to-convert-existing-data-into-parquet/

I took an existing Avro schema, consisting of a few columns each
containing a map, and wrote, then read back, about 40MB of data using
both Avro's own serialisation, and Parquet's. Parquet's ended up being
about five times slower. This ratio was maintained when I moved to
using ~1GB data. I'd expect it to be a little slower, as I was reading
back all columns, but five times seems high. Is there anything simple
I might be missing?
Thanks
Rob

Reply via email to