Deepak Majeti commented on PARQUET-684:

I looked at the code and the blog briefly. The current implementation works for 
dictionary indices that are bit-packed.
This implementation will have to be extended to support Rle-Bitpacked hybrid 
encoding current used by parquet-cpp to encode dictionary index values.
Encoding details here: 

I guess the rle encoding of indices will furture improve the performance since 
it will not require the costly gather instruction.

> [C++] Hardware optimizations for dictionary / RLE encoding/decoding
> -------------------------------------------------------------------
>                 Key: PARQUET-684
>                 URL: https://issues.apache.org/jira/browse/PARQUET-684
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Wes McKinney
>            Assignee: Deepak Majeti
> See discussion in 
> https://github.com/apache/parquet-cpp/pull/140
> and experiments from Daniel Lemire in 
> https://github.com/lemire/dictionary

This message was sent by Atlassian JIRA

Reply via email to