alamb opened a new issue, #7407: URL: https://github.com/apache/arrow-rs/issues/7407
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** When using block compression, there is a tradeoff: 1. Smaller file sizes (and thus potentially more efficient file IO) 2. Longer decoding time (requires more CPU to decode the pages) Most systems I know of in practice (e.g. duckdb, datafusion, influxdb 3.0) default to using page level compression, but the parquet defaults to no compression ([source here](https://docs.rs/parquet/latest/src/parquet/file/properties.rs.html#34)) @XiangpengHao suggests in https://github.com/apache/arrow-rs/issues/7363#issuecomment-2797292029 > As a side note, I think we should by default enable compression in parquet writer settings. As parquet doesn't have good string encodings, without block compressions, string columns are practically almost uncompressed. **Describe the solution you'd like** Enable compression by default **Describe alternatives you've considered** One question is if we should use default compressions for strings and non strings 1. I suggest we follow DuckDB's lead and default to `SNAPPY` compression to balance speed and compression ratio. 2. We could also use `ZSTD`, what DataFusion uses -- that gives higher compression ratios but slower performance 3. Don't change the default but better document the **Additional context** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
