alippai commented on issue #3520: URL: https://github.com/apache/arrow-rs/issues/3520#issuecomment-1386433094
@tustvold I agree that the reading performance might not improve but the downstream operations like filter/join/group by (if optimized for REE) would definitely make it worth. Absolutely not a day 0 feature. The zlib/zstd idea is far fetched, but: 1.) zlib sometimes decides to store data as Z_RLE, might worth checking if this happens with Parquet too: https://optipng.sourceforge.net/pngtech/z_rle.html 2.) I was wondering if LZ77 -> REE is possible without fully decompressing the data, something like posted here (for a different algorithm): https://www.researchgate.net/publication/261072335_From_Run_Length_Encoding_to_LZ78_and_Back_Again Both of these are super theoretical and needs access to the compression building blocks, the low level api (as you don’t want to ask the lib to decompress the data, but you want to go compressed -> REE to save big) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
