lidavidm commented on a change in pull request #9620: URL: https://github.com/apache/arrow/pull/9620#discussion_r591890623
########## File path: cpp/src/parquet/arrow/reader.h ########## @@ -175,6 +177,22 @@ class PARQUET_EXPORT FileReader { const std::vector<int>& row_group_indices, const std::vector<int>& column_indices, std::unique_ptr<::arrow::RecordBatchReader>* out) = 0; + /// \brief Return a generator of record batch vectors, where each vector represents + /// the contents of a row group from row_group_indices, whose columns are selected + /// by column_indices. + /// + /// An empty optional indicates the end of the generator. + /// + /// Note that the ordering in row_group_indices and column_indices matter. FileReaders + /// must outlive their generators. + /// + /// \returns error Result if either row_group_indices or column_indices contains an + /// invalid index + virtual ::arrow::Result< + ::arrow::AsyncGenerator<::arrow::util::optional<::arrow::RecordBatchVector>>> + GetRecordBatchGenerator(const std::vector<int>& row_group_indices, Review comment: Yes, currently scan tasks know which row group index they correspond to. As part of this we may want to make scan tasks less granular than a single row group as discussed. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org