wgtmac commented on issue #37559: URL: https://github.com/apache/arrow/issues/37559#issuecomment-1835578508
> Why is it necessary to have a RowRanges API? The ::parquet::internal::RecordReader has ReadRecords and SkipRecords APIs, this should be sufficient to read/skip ranges of rows. > > This will reduce the burden of converting whatever upstream format of row ranges to the one compatible with what we define here. > > P.S. We have considered removing the RecordReader from the internal scope: [#37003 (comment)](https://github.com/apache/arrow/pull/37003#discussion_r1287132438) IMO, it would be too late to optimize I/O and decoding when SkipRecords is called. Pushing down row ranges has good separation and makes it easy to do the planning before reading. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
