westonpace opened a new pull request #11616:
URL: https://github.com/apache/arrow/pull/11616
**This is still very much a WIP**
This PR attempts to address several issues:
* Memory mapped IPC reads always call WillNeed on the data and the user has
no way to avoid this
* Projection pushdown is only available in the synchronous API
* Coalescing / readahead is only available via the generators API
* There is a lot of duplicate code in the generators path
It adds two new methods to RecordBatchFileReader:
```
/// \brief Begin loading metadata for the desired batches into memory.
///
/// This method will also begin loading all dictionaries messages into
memory.
///
/// For a regular file this will immediately begin disk I/O in the
background on a
/// thread on the IOContext's thread pool. If the file is memory mapped
this will
/// ensure the memory needed for the metadata is paged from disk into
memory
///
/// \param indices Indices of the batches to prefetch
/// If empty then all batches will be prefetched.
virtual Status WillNeedMetadata(const std::vector<int>& indices) = 0;
/// \brief Begin loading metadata for the desired batches into memory and
indicate
/// that the data itself should be prefetched when it is requested
///
/// This method should not be called in combination with WillNeedMetadata.
If you want
/// to prefetch the data then use this method. If you do not want to
prefetch the data
/// (because you are only accessing a small # of items in the batch's
arrays) then you
/// should use WillNeedMetadata
///
/// This method will immediately start the I/O for the metadata and
dictionaries.
///
/// This method will not immediately start the I/O for the data. The data
I/O will be
/// started when you call ReadRecordBatch.
///
/// If you want to read multiple batches in parallel then you can make
concurrent calls
/// to ReadRecordBatch or ReadRecordBatchAsync
/// \param indices
/// \return
virtual Status WillNeedBatches(const std::vector<int>& indices) = 0;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]