alamb opened a new issue, #8799: URL: https://github.com/apache/arrow-rs/issues/8799
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** When reading data from Parquet files sometimes you want to know the lineage of the rows that come out so you can identify them in the future. For example, when implementing delete predicates, you may want to find all rows that match a particular filter condition and remember their row numbers in the file (e.g. how iceberg works) Or you may want to build up a secondary index with information on the min/max values for each row group or data page Sometimes this information can be determined by reading all the rows from the parquet file and reconstructing the row number (or row group number), but this is slow if predicates are applied during the scan, for example, in which case the row numbers are needed from the reader itself. Information that is hard to **Describe the solution you'd like** Add "virtual" column support to the Parquet reader **Additional context** - [ ] https://github.com/apache/arrow-rs/issues/7299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
