paleolimbot opened a new pull request, #14521:
URL: https://github.com/apache/arrow/pull/14521

   This makes the default `map_batches()` behaviour lazy (i.e., the function is 
called once per batch as each batch arrives):
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` 
for more information.
   
   source <- RecordBatchReader$create(
     record_batch(a = 1:10),
     record_batch(a = 11:20)
   )
   
   mapped <- map_batches(source, function(x) {
     message("Hi! I'm being evaluated!")
     x
   }, .schema = source$schema)
   
   as_arrow_table(mapped)
   #> Hi! I'm being evaluated!
   #> Hi! I'm being evaluated!
   #> Table
   #> 20 rows x 1 columns
   #> $a <int32>
   ```
   
   <sup>Created on 2022-10-26 with [reprex 
v2.0.2](https://reprex.tidyverse.org)</sup>
   
   This was previously a confusing default since piping the resulting 
`RecordBatchReader` into an `ExecPlan` would fail for some ExecPlans before 
ARROW-17178 (#13706). This PR commits to the (more optimal/expected) lazy 
behaviour.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to