[
https://issues.apache.org/jira/browse/PARQUET-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497185#comment-15497185
]
Deepak Majeti commented on PARQUET-684:
---------------------------------------
I looked at the code and the blog briefly. The current implementation works for
dictionary indices that are bit-packed.
This implementation will have to be extended to support Rle-Bitpacked hybrid
encoding current used by parquet-cpp to encode dictionary index values.
Encoding details here:
https://github.com/apache/parquet-cpp/blob/master/src/parquet/util/rle-encoding.h#L33
I guess the rle encoding of indices will furture improve the performance since
it will not require the costly gather instruction.
> [C++] Hardware optimizations for dictionary / RLE encoding/decoding
> -------------------------------------------------------------------
>
> Key: PARQUET-684
> URL: https://issues.apache.org/jira/browse/PARQUET-684
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-cpp
> Reporter: Wes McKinney
> Assignee: Deepak Majeti
>
> See discussion in
> https://github.com/apache/parquet-cpp/pull/140
> and experiments from Daniel Lemire in
> https://github.com/lemire/dictionary
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)