pitrou opened a new issue, #40636: URL: https://github.com/apache/arrow/issues/40636
### Describe the enhancement requested Currently, the choice of default encoding for a non-dictionary data page is trivial. It happens in two places: 1. in the `FallbackToPlainEncoding` function for columns for which dictionary encoding is attempted: https://github.com/apache/arrow/blob/5718a2862b4254d8bf938912d8958837ac7313a5/cpp/src/parquet/column_writer.cc#L1567-L1580 2. in the `ColumnWriter::Make` factory function for columns for which dictionary encoding is not attempted: https://github.com/apache/arrow/blob/5718a2862b4254d8bf938912d8958837ac7313a5/cpp/src/parquet/column_writer.cc#L2375-L2382 I'll note that parquet-mr does not limit dictionary encoding fallback to PLAIN, even for "v1" Parquet files: https://github.com/apache/parquet-mr/blob/95b004c3df473e3ab0963dc5136934ce5235d5df/parquet-column/src/main/java/org/apache/parquet/column/values/factory/DefaultV1ValuesWriterFactory.java#L124-L139 We should probably consolidate the logic from the two functions above and make it more sophisticated, allowing the best encoding for the selected Parquet version. Also related: https://github.com/apache/arrow/issues/38441 ### Component(s) C++, Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org