[
https://issues.apache.org/jira/browse/ARROW-11924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299062#comment-17299062
]
Weston Pace commented on ARROW-11924:
-------------------------------------
That should be ok. If we get to a point that a consumer needs an
async-reentrant input (e.g. we decide that the fragment generator should always
be async-reentrant) then we can add a general purpose `buffer` operator that
simply buffers incoming requests. Buffering requests is generally assumed to
be cheap (it wouldn't be an unlimited buffer as we never pull with unlimited
reentrancy) and doesn't add any real memory pressure.
If this is in the scanning we will need to change Dataset::GetFragmentsAsync to
a FragmentGenerator instead of Future<FragmentVector> but that is not really a
problem either.
> [C++] Provide streaming output from GetFileInfo
> -----------------------------------------------
>
> Key: ARROW-11924
> URL: https://issues.apache.org/jira/browse/ARROW-11924
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 3.0.0
> Reporter: Ben Kietzman
> Assignee: Antoine Pitrou
> Priority: Major
>
> For situations where a monolithic call to GetFileInfo will be slow, it would
> be useful to immediately receive any results which *are* ready through an
> {{AsyncGenerator<std::vector<FileInfo>>}} or so. This is probably a
> prerequisite of ARROW-8163, where the goal is to begin scanning known
> fragments while other fragments are still being discovered.
> IIUC, one concrete example would be paging through a long output from S3's
> ListObjectsV2.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)