binmahone opened a new pull request, #38867: URL: https://github.com/apache/arrow/pull/38867
### this PR is based on arrow-13 for early review ### Rationale for this change see https://github.com/apache/arrow/issues/38865 ### What changes are included in this PR? add a parameter in GetRecordBatchReader to accept row_ranges. The row ranges can be used to: 1. skip decompressing and decoding unnecessary pages. This part is done by leveraging an existing hook called DataPageFilter2 2. skip unwanted rows in the necessary pages. This part is done by a new added class called RecordSkipper. In Parquet, row number is not aligned across different columns' pages, so each Column Reader does NOT share RecordSkipper ### Are these changes tested? a new test file callled range_reader_test.cc is added ### Are there any user-facing changes? a new GetRecordBatchReader API overload is added. NO existing API is broken -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
