bkietz commented on pull request #9482: URL: https://github.com/apache/arrow/pull/9482#issuecomment-781458352
The `once_flag` approach sounds viable in principle. However I'm concerned that it could cause lots of threads to block for a single pre-buffering, then block them all pre-buffering for the next file, etc. For example, a parquet fragment with 42 row groups will generate a scan task for each, and if one of those tasks is scheduled first in each thread (as in Scanner::ToTable) then I think they'd block the whole pool. Fundamentally, it's part of the contract of ScanTask that two ScanTasks are independent, and any shared pre-buffering violates that. I think the correct fix for right now is to write another subclass of `ScanTask` which wraps a whole pre-buffered file reader rather than individual row groups. This will earn us most of the perf advantage of pre-buffering in the common case of scanning many files at the cost of losing parallelism when scanning few files. Eventually we'll use `Future`s to resolve this, probably by replacing ```diff - virtual Result<RecordBatchIterator> Execute() = 0; + virtual AsyncGenerator<std::shared_ptr<RecordBatch>> Execute() = 0; ``` this will enable more fluid handling of the wait-for-pre-buffer: the first record batch from scan tasks which happen to originate from the same file will simply wait a bit longer ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
