DarkWanderer opened a new issue, #9010:
URL: https://github.com/apache/arrow-rs/issues/9010

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Some databases, one example being Grafana Tempo, utilize column dictionaries 
as makeshift column indexes, to improve filtering speed ad-hoc. Checking if 
low-cardinality value is present in dictionary allows to effectively pre-filter 
data by skipping whole row group
   
   **Describe the solution you'd like**
   
   Add API to `ParquetRecordBatchStreamBuilder` that allows to inspect contents 
of the dictionary
   
   **Describe alternatives you've considered**
   
   Column indexes have high expected size cost and are not always available 
(e.g. for legacy data)
   
   **Additional context**
   
   It is possible to access this information in `SerializedFileReader` already 
by using "peekable" page iterator


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to