[ https://issues.apache.org/jira/browse/PARQUET-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martin Loncaric updated PARQUET-2132: ------------------------------------- Description: Quantile Compression (https://github.com/mwlon/quantile-compression) is a recent but stable compression algorithm for numerical sequences that averages 35%+ higher compression ratio than the next best codec (zstd), given the same compression time. It has fairly fast decompression speed, close to that of zstd. Compared to Parquet's built-in PFor-like integer compression algorithm, it achieves a much higher compression ratio at slower speed. Adding q_compress as a column codec for all numerical columns could substantially reduce the size of most Parquet files. q_compress is implemented in Rust, which has good interop with C++ and can run in JVM via JNI (e.g. https://github.com/pancake-db/pancake-scala-client). was: Quantile Compression (https://github.com/mwlon/quantile-compression) is a recent but stable compression algorithm for numerical sequences that averages 35%+ higher compression ratio than the next best codec (zstd), given the same compression time. It has fairly fast decompression speed, close to that of zstd. Adding q_compress as a column codec for all numerical columns could substantially reduce the size of most Parquet files. q_compress is implemented in Rust, which has good interop with C++ and can run in JVM via JNI (e.g. https://github.com/pancake-db/pancake-scala-client). > Support Quantile Compression q_compress column codec > ---------------------------------------------------- > > Key: PARQUET-2132 > URL: https://issues.apache.org/jira/browse/PARQUET-2132 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp, parquet-format, parquet-mr > Reporter: Martin Loncaric > Priority: Major > > Quantile Compression (https://github.com/mwlon/quantile-compression) is a > recent but stable compression algorithm for numerical sequences that averages > 35%+ higher compression ratio than the next best codec (zstd), given the same > compression time. It has fairly fast decompression speed, close to that of > zstd. Compared to Parquet's built-in PFor-like integer compression algorithm, > it achieves a much higher compression ratio at slower speed. Adding > q_compress as a column codec for all numerical columns could substantially > reduce the size of most Parquet files. > q_compress is implemented in Rust, which has good interop with C++ and can > run in JVM via JNI (e.g. https://github.com/pancake-db/pancake-scala-client). -- This message was sent by Atlassian Jira (v8.20.1#820001)