westonpace commented on code in PR #34461:
URL: https://github.com/apache/arrow/pull/34461#discussion_r1127045179
##########
cpp/src/parquet/arrow/reader.h:
##########
@@ -249,6 +249,13 @@ class PARQUET_EXPORT FileReader {
virtual ::arrow::Status ReadRowGroup(int i, std::shared_ptr<::arrow::Table>*
out) = 0;
+ virtual ::arrow::Status WillNeedRowGroups(const std::vector<int>& row_groups,
+ const std::vector<int>&
column_indices) = 0;
Review Comment:
https://github.com/apache/arrow/pull/14723 adds a filesystem method for
"read many". I would like to see this method support plugging and splitting in
the same way that `ReadRangeCache` does today (then, `ReadRangeCache` will only
be needed if you need true "caching"). Then I think we can use that instead of
the `ReadRangeCache`.
This will allow local filesystems to rely on the OS for plugging & splitting
and will allow remote filesystems like S3 to adapt the algorithm to their
needs. It's also async and returns a future reliably so you can then return a
future from this method (I agree that would be desired).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]