wesm commented on a change in pull request #6744:
URL: https://github.com/apache/arrow/pull/6744#discussion_r412474320



##########
File path: cpp/src/parquet/file_reader.h
##########
@@ -117,6 +117,15 @@ class PARQUET_EXPORT ParquetFileReader {
   // Returns the file metadata. Only one instance is ever created
   std::shared_ptr<FileMetaData> metadata() const;
 
+  /// Pre-buffer the specified column indices in all row groups.
+  ///
+  /// Only has an effect if ReaderProperties.is_coalesced_stream_enabled is 
set;
+  /// otherwise this is a no-op. The reader internally maintains a cache which 
is
+  /// overwritten each time this is called. Intended to increase performance on
+  /// high-latency filesystems (e.g. Amazon S3).
+  void PreBuffer(const std::vector<int>& row_groups,
+                 const std::vector<int>& column_indices);

Review comment:
       On second look it seems desirable to be able to have control over when 
this operation is invoked. There might be some other options relating to 
concurrent IO calls that you might want to pass here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to