wesm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r412474320
########## File path: cpp/src/parquet/file_reader.h ########## @@ -117,6 +117,15 @@ class PARQUET_EXPORT ParquetFileReader { // Returns the file metadata. Only one instance is ever created std::shared_ptr<FileMetaData> metadata() const; + /// Pre-buffer the specified column indices in all row groups. + /// + /// Only has an effect if ReaderProperties.is_coalesced_stream_enabled is set; + /// otherwise this is a no-op. The reader internally maintains a cache which is + /// overwritten each time this is called. Intended to increase performance on + /// high-latency filesystems (e.g. Amazon S3). + void PreBuffer(const std::vector<int>& row_groups, + const std::vector<int>& column_indices); Review comment: On second look it seems desirable to be able to have control over when this operation is invoked. There might be some other options relating to concurrent IO calls that you might want to pass here ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org