[ 
https://issues.apache.org/jira/browse/ARROW-11924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299062#comment-17299062
 ] 

Weston Pace commented on ARROW-11924:
-------------------------------------

That should be ok.  If we get to a point that a consumer needs an 
async-reentrant input (e.g. we decide that the fragment generator should always 
be async-reentrant) then we can add a general purpose `buffer` operator that 
simply buffers incoming requests.  Buffering requests is generally assumed to 
be cheap (it wouldn't be an unlimited buffer as we never pull with unlimited 
reentrancy) and doesn't add any real memory pressure.

If this is in the scanning we will need to change Dataset::GetFragmentsAsync to 
a FragmentGenerator instead of Future<FragmentVector> but that is not really a 
problem either.

> [C++] Provide streaming output from GetFileInfo
> -----------------------------------------------
>
>                 Key: ARROW-11924
>                 URL: https://issues.apache.org/jira/browse/ARROW-11924
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 3.0.0
>            Reporter: Ben Kietzman
>            Assignee: Antoine Pitrou
>            Priority: Major
>
> For situations where a monolithic call to GetFileInfo will be slow, it would 
> be useful to immediately receive any results which *are* ready through an 
> {{AsyncGenerator<std::vector<FileInfo>>}} or so. This is probably a 
> prerequisite of ARROW-8163, where the goal is to begin scanning known 
> fragments while other fragments are still being discovered.
> IIUC, one concrete example would be paging through a long output from S3's 
> ListObjectsV2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to