somandal commented on issue #7870:
URL: https://github.com/apache/pinot/issues/7870#issuecomment-1190310738

   > I can’t access the document but was byte alignment (rounding the 
dictionary’s bits up to the next multiple of 8, so padding each dictionarized 
value with leading zeros) prior to LZ4 compression attempted? If the dictionary 
codes aren’t byte aligned, byte-oriented compression schemes won’t work well. I 
explained this on a call with @siddharthteotia several months ago.
   
   @richardstartin can you provide a gmail ID I can share the document with so 
that you can access the results? We didn't try the bit alignment to the next 
multiple of 8. Today morning I ran a few experiments to try that out. We 
haven't implemented a full-fledged index creator for the approach to compress 
the bit packed dictionary IDs in the forward index, but use an approximate 
approach where we extract the dictionary encoded forward index from the segment 
file and apply lz4 / zstd compression for the whole extracted file using 
command line compression tools (which is why it's approximate).
   
   When I tried it this morning, the size of the forward index increased after 
rounding up from 18 to 24 bits as expected. LZ4 compression couldn't bring the 
size down to the original bit encoded format using 18 bits. ZSTD performed 
better and did bring down the size a bit (compression ratio compared to the 
original dict encoded forward index using 18 bits: 1.15).
   
   I've updated these findings in the document as well.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to