westonpace commented on a change in pull request #9995:
URL: https://github.com/apache/arrow/pull/9995#discussion_r612505516
##########
File path: cpp/src/arrow/util/async_generator.h
##########
@@ -1332,4 +1332,49 @@ Result<Iterator<T>> MakeReadaheadIterator(Iterator<T>
it, int readahead_queue_si
return MakeGeneratorIterator(std::move(owned_bg_generator));
}
+/// \brief Make a generator that returns a single pre-generated future
Review comment:
FWIW: There are only two spots today where we pull in an async-reentrant
fashion. The first is in `ReadaheadGenerator::operator()()`. There is a loop...
```
for (int i = 0; i < max_readahead_; i++) {
auto next = source_generator_();
next.AddCallback(mark_finished_if_done_);
readahead_queue_.push(std::move(next));
}
```
If `source_generator_` is fully synchronous (i.e. it always returns finished
futures) this does not add any parallelism (an unfortunate fact I hope to
remedy someday).
However, if `source_generator_` returns an unfinished future, then this will
fan out the tasks. For example, if `source_generator_` is reading from a file
then this will cause up to `max_readahead_` concurrent file reads.
The second spot is in `MergedGenerator` and I will probably remove it at
some point as it isn't important and causes more headaches than it should.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]