[
https://issues.apache.org/jira/browse/PARQUET-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Loncaric updated PARQUET-2132:
-------------------------------------
Description:
Quantile Compression (https://github.com/mwlon/quantile-compression) is a
recent but stable compression algorithm for numerical sequences that averages
35%+ higher compression ratio than the next best codec (zstd), given the same
compression time. It has fairly fast decompression speed, close to that of
zstd. Adding q_compress as a column codec for all numerical columns could
substantially reduce the size of most Parquet files.
q_compress is implemented in Rust, which has good interop with C++ and can run
in JVM via JNI (e.g. https://github.com/pancake-db/pancake-scala-client).
was:
Quantile Compression (https://github.com/mwlon/quantile-compression) is a
recent but stable compression algorithm for numerical sequences that averages
35%+ higher compression ratio than the next best codec (zstd), given the same
compression time. It has fairly fast decompression speed, close to that of
zstd. Adding q_compress as a column codec for all numerical columns could
substantially reduce the size of most parquet files.
q_compress is implemented in Rust, which has good interop with C++ and can run
in JVM via JNI (e.g. https://github.com/pancake-db/pancake-scala-client).
> Support Quantile Compression q_compress column codec
> ----------------------------------------------------
>
> Key: PARQUET-2132
> URL: https://issues.apache.org/jira/browse/PARQUET-2132
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-cpp, parquet-format, parquet-mr
> Reporter: Martin Loncaric
> Priority: Major
>
> Quantile Compression (https://github.com/mwlon/quantile-compression) is a
> recent but stable compression algorithm for numerical sequences that averages
> 35%+ higher compression ratio than the next best codec (zstd), given the same
> compression time. It has fairly fast decompression speed, close to that of
> zstd. Adding q_compress as a column codec for all numerical columns could
> substantially reduce the size of most Parquet files.
> q_compress is implemented in Rust, which has good interop with C++ and can
> run in JVM via JNI (e.g. https://github.com/pancake-db/pancake-scala-client).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)