[GitHub] [arrow-rs] alippai commented on issue #3520: Implement Run Length Encoding (RLE) / Run End Encoding (REE) support (Epic)

GitBox Tue, 17 Jan 2023 19:29:36 -0800


alippai commented on issue #3520:
URL: https://github.com/apache/arrow-rs/issues/3520#issuecomment-1386433094


   @tustvold I agree that the reading performance might not improve but the 
downstream operations like filter/join/group by (if optimized for REE) would 
definitely make it worth. Absolutely not a day 0 feature.
   
   The zlib/zstd idea is far fetched, but:
   1.) zlib sometimes decides to store data as Z_RLE, might worth checking if 
this happens with Parquet too: 
https://optipng.sourceforge.net/pngtech/z_rle.html
   2.) I was wondering if LZ77 -> REE is possible without fully decompressing 
the data, something like posted here (for a different algorithm): 
https://www.researchgate.net/publication/261072335_From_Run_Length_Encoding_to_LZ78_and_Back_Again
   
   Both of these are super theoretical and needs access to the compression 
building blocks, the low level api (as you don’t want to ask the lib to 
decompress the data, but you want to go compressed -> REE to save big)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] alippai commented on issue #3520: Implement Run Length Encoding (RLE) / Run End Encoding (REE) support (Epic)

Reply via email to