[ 
https://issues.apache.org/jira/browse/KUDU-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated KUDU-2888:
------------------------------
    Attachment: codec-test.py

> Better encoding for dictionary code-words
> -----------------------------------------
>
>                 Key: KUDU-2888
>                 URL: https://issues.apache.org/jira/browse/KUDU-2888
>             Project: Kudu
>          Issue Type: Bug
>          Components: cfile, perf
>            Reporter: Todd Lipcon
>            Priority: Major
>         Attachments: codec-test.py
>
>
> Currently we use bitshuffle for all ints, including dictionary codewords. For 
> dictionary codewords, we know the maximum possible value up-front, and we 
> also know that the ints will be non-negative and small. This set of 
> constraints makes it much better to use a specialized bitpacking algorithm 
> rather than a more generic compression like bitshuffle+lz4. Based on some 
> quick experiments I ran, we can probably get a several-fold decoding speedup 
> with no loss of compression by switching to a codec like simdbitpacking for 
> these codewords.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to