Eric Daniel created PARQUET-671:
-----------------------------------

             Summary: Improve performance of RLE/bit-packed decoding in 
parquet-cpp
                 Key: PARQUET-671
                 URL: https://issues.apache.org/jira/browse/PARQUET-671
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-cpp
            Reporter: Eric Daniel


There are steps that can dramatically improve decoding performance:

- when decoding repeated values in the rle/dictionary encoding, do the 
dictionary lookup only once
- when decoding bit-packed sequences, do the decoding in batches so the bit 
unpacker's state can be kept in registers (instead of updating members for 
every decoded value)
- use Daniel Lemire's fast unpacking routines whenever possible 
(https://github.com/lemire/FrameOfReference/)

I have a PR ready to implement these changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to