DarkWanderer opened a new issue, #9010: URL: https://github.com/apache/arrow-rs/issues/9010
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Some databases, one example being Grafana Tempo, utilize column dictionaries as makeshift column indexes, to improve filtering speed ad-hoc. Checking if low-cardinality value is present in dictionary allows to effectively pre-filter data by skipping whole row group **Describe the solution you'd like** Add API to `ParquetRecordBatchStreamBuilder` that allows to inspect contents of the dictionary **Describe alternatives you've considered** Column indexes have high expected size cost and are not always available (e.g. for legacy data) **Additional context** It is possible to access this information in `SerializedFileReader` already by using "peekable" page iterator -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
