Will Jones created ARROW-16703:
----------------------------------

             Summary: [R] Refactor map_batches() so it can stream results
                 Key: ARROW-16703
                 URL: https://issues.apache.org/jira/browse/ARROW-16703
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
    Affects Versions: 8.0.0
            Reporter: Will Jones
             Fix For: 9.0.0


As part of ARROW-15271, {{map_batches()}} was modified to return a 
{{RecordBatchReader}}, but the implementation collects all results as a list of 
record batches and then converts that to a reader. In theory, if we push the 
implementation down to C++, we should be able to make a proper streaming RBR.

We won't know the schema ahead of time. We could optionally accept it, which 
would allow the function to be lazy. Or we could eagerly evaluate just the 
first batch to determine the schema. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to