[ 
https://issues.apache.org/jira/browse/PARQUET-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497185#comment-15497185
 ] 

Deepak Majeti commented on PARQUET-684:
---------------------------------------

I looked at the code and the blog briefly. The current implementation works for 
dictionary indices that are bit-packed.
This implementation will have to be extended to support Rle-Bitpacked hybrid 
encoding current used by parquet-cpp to encode dictionary index values.
Encoding details here: 
https://github.com/apache/parquet-cpp/blob/master/src/parquet/util/rle-encoding.h#L33

I guess the rle encoding of indices will furture improve the performance since 
it will not require the costly gather instruction.


> [C++] Hardware optimizations for dictionary / RLE encoding/decoding
> -------------------------------------------------------------------
>
>                 Key: PARQUET-684
>                 URL: https://issues.apache.org/jira/browse/PARQUET-684
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Wes McKinney
>            Assignee: Deepak Majeti
>
> See discussion in 
> https://github.com/apache/parquet-cpp/pull/140
> and experiments from Daniel Lemire in 
> https://github.com/lemire/dictionary



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to