alamb opened a new issue, #8799:
URL: https://github.com/apache/arrow-rs/issues/8799

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   When reading data from Parquet files sometimes you want to know the lineage 
of the rows that come out so you can identify them in the future. 
   
   For example, when implementing delete predicates, you may want to find all 
rows that match a particular filter condition and remember their row numbers in 
the file (e.g. how iceberg works)
   
   Or you may want to build up a secondary index with information on the 
min/max values for each row group or data page
   
   Sometimes this information can be determined by reading all the rows from 
the parquet file and reconstructing the row number (or row group number), but 
this is slow if predicates are applied during the scan, for example, in which 
case the  row numbers are needed from the reader itself. 
   
   Information that is hard to 
   
   **Describe the solution you'd like**
   Add "virtual" column support to the Parquet reader 
   
   
   
   **Additional context**
   - [ ] https://github.com/apache/arrow-rs/issues/7299
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to