pitrou opened a new issue, #38441: URL: https://github.com/apache/arrow/issues/38441
### Describe the enhancement requested Right now, configuring good encoding values is difficult for users. There is nothing to help them make those decisions, and the defaults are a bit simplistic (try RLE_DICTIONARY then fall back on PLAIN, IIUC). If they want to override encodings, they have to do so on a column-by-column basis (which probably becomes very cumbersome if there hundreds of columns). Ideally, there should be a way for users to get an automatic selection of encodings, based on their data or at least their data types (and also the selected Parquet version), that provides a good compromise between disk footprint and decoding speed. (in Python, think `pq.write_table(..., column_encoding="auto")`) Perhaps it would be also nice for users to pass per-datatype preferences, rather than per-column. ### Component(s) C++, Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
