[GitHub] [arrow] lidavidm commented on a change in pull request #9620: ARROW-11843: [C++] Provide reentrant Parquet reader

GitBox Wed, 10 Mar 2021 13:38:28 -0800


lidavidm commented on a change in pull request #9620:
URL: https://github.com/apache/arrow/pull/9620#discussion_r591890623




##########
File path: cpp/src/parquet/arrow/reader.h
##########
@@ -175,6 +177,22 @@ class PARQUET_EXPORT FileReader {
       const std::vector<int>& row_group_indices, const std::vector<int>& 
column_indices,
       std::unique_ptr<::arrow::RecordBatchReader>* out) = 0;
 
+  /// \brief Return a generator of record batch vectors, where each vector 
represents
+  ///     the contents of a row group from row_group_indices, whose columns 
are selected
+  ///     by column_indices.
+  ///
+  /// An empty optional indicates the end of the generator.
+  ///
+  /// Note that the ordering in row_group_indices and column_indices matter. 
FileReaders
+  /// must outlive their generators.
+  ///
+  /// \returns error Result if either row_group_indices or column_indices 
contains an
+  ///     invalid index
+  virtual ::arrow::Result<
+      
::arrow::AsyncGenerator<::arrow::util::optional<::arrow::RecordBatchVector>>>
+  GetRecordBatchGenerator(const std::vector<int>& row_group_indices,

Review comment:
       Yes, currently scan tasks know which row group index they correspond to. 
As part of this we may want to make scan tasks less granular than a single row 
group as discussed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #9620: ARROW-11843: [C++] Provide reentrant Parquet reader

Reply via email to