etseidl opened a new issue, #8643: URL: https://github.com/apache/arrow-rs/issues/8643
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** - Part of #5853 One of the goals of the Thrift remodel project (#5854) was to enable such things as selective decoding of parts of the Parquet metadata. The parsers are now in place to enable this, but was is lacking now is a way to communicate what bits of the metadata are required. **Describe the solution you'd like** Some mechanism to communicate to the metadata parsers what is needed. Options can include such things as: - Skip some statistics fields in `ColumnMetaData` (`Statistics`, `PageEncodingStatistics`, `SizeStatistics`, etc). - Parse page encoding statistics into some other form (boolean, bitmask) to support dictionary based pushdown. - Column projections (i.e. skip decoding metadata for columns that will not be read). - Row group selection (only parse metadata for requested set of row groups). - Only return schema. - Skip schema and use a provided schema (perhaps from an earlier decode). - Perhaps move encryption parameters here as well. - Others I haven't yet thought of. **Describe alternatives you've considered** These options could be added to current properties objects, but there doesn't seem to b a single place for all of these. For instance, `SerializedFileReader` takes a `ReadOptions`, that contains a `ReaderProperties` which is what is subsequently used by the `SerialzedRowGroupReader` and children. On the arrow side we instead use an `ArrowReaderOptions`. The `ParquetMetaDataReader` and `ParquetMetaDataPushDecoder` manage their own set of options. It would be nice to have a single place to set metadata parsing options and then pass that to the respective decoders. **Additional context** <!-- Add any other context or screenshots about the feature request here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
