mapleFU opened a new issue, #8378: URL: https://github.com/apache/arrow-rs/issues/8378
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Parquet V2 has rich encoding patterns, but configuring them hand by hande a bit hard. Maybe a way is like btrblocks, which can sampling some keys to decide the encoding pattern. The most common pattern is whether to use dictionary encoding or delta encoding. This can be well done by sampling. **Describe the solution you'd like** Maybe a sampler? The DuckDB and Velox has similiar things. **Describe alternatives you've considered** Or a dictionary config: 1. Dictionary item bytes 2. Dictionary repeats in sampled rows, like 10000 non null rows should only have at most 5000 items when building dictionary. It's not perfect but in most cases it's useful. **Additional context** No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org