[ https://issues.apache.org/jira/browse/KUDU-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated KUDU-2888: ------------------------------ Attachment: codec-test.py > Better encoding for dictionary code-words > ----------------------------------------- > > Key: KUDU-2888 > URL: https://issues.apache.org/jira/browse/KUDU-2888 > Project: Kudu > Issue Type: Bug > Components: cfile, perf > Reporter: Todd Lipcon > Priority: Major > Attachments: codec-test.py > > > Currently we use bitshuffle for all ints, including dictionary codewords. For > dictionary codewords, we know the maximum possible value up-front, and we > also know that the ints will be non-negative and small. This set of > constraints makes it much better to use a specialized bitpacking algorithm > rather than a more generic compression like bitshuffle+lz4. Based on some > quick experiments I ran, we can probably get a several-fold decoding speedup > with no loss of compression by switching to a codec like simdbitpacking for > these codewords. -- This message was sent by Atlassian JIRA (v7.6.3#76005)