paleolimbot commented on PR #13706:
URL: https://github.com/apache/arrow/pull/13706#issuecomment-1230223775

   A quick summary + reprex to augment the bit I wrote above...this PR (1) 
undoes the kludges I introduced when getting the user-defined function bits to 
work and not fail the valgrind check, (2) allows nested exec plans to with 
user-defined functions to work and (3) allows the result of an exec plan to be 
inspected (e.g., to print/walk its relation tree or calculate its schema).
   
   Reprex to play with:
   
   <details>
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` 
for more information.
   library(dplyr, warn.conflicts = FALSE)
   
   # The result of an ExecPlan is now a subclass of the RecordBatchReader
   # that more carefully manages the lifecycle of the underlying exec plan
   # (which includes not starting it until the first batch has been pulled
   # and releasing it as soon as it is no longer needed)
   result <- mtcars |> 
     as_arrow_table() |> 
     filter(mpg > 25) |> 
     as_record_batch_reader()
   
   result
   #> ExecPlanReader
   #> <Status: PLAN_NOT_STARTED>
   #> 
   #> mpg: double
   #> cyl: double
   #> disp: double
   #> hp: double
   #> drat: double
   #> wt: double
   #> qsec: double
   #> vs: double
   #> am: double
   #> gear: double
   #> carb: double
   #> 
   #> See $metadata for additional Schema metadata
   #> 
   #> See $Plan() for details.
   result$Plan()
   #> ExecPlan
   #> ExecPlan with 4 nodes:
   #> 3:SinkNode{}
   #>   2:ProjectNode{projection=[mpg, cyl, disp, hp, drat, wt, qsec, vs, am, 
gear, carb]}
   #>     1:FilterNode{filter=(mpg > 25)}
   #>       0:TableSourceNode{}
   result$PlanStatus()
   #> [1] "PLAN_NOT_STARTED"
   
   as_arrow_table(result)
   #> Table
   #> 6 rows x 11 columns
   #> $mpg <double>
   #> $cyl <double>
   #> $disp <double>
   #> $hp <double>
   #> $drat <double>
   #> $wt <double>
   #> $qsec <double>
   #> $vs <double>
   #> $am <double>
   #> $gear <double>
   #> $carb <double>
   #> 
   #> See $metadata for additional Schema metadata
   result$PlanStatus()
   #> [1] "PLAN_FINISHED"
   
   # head() on a record batch reader is now fully lazy (i.e., never
   # pull batches from its source until requested)
   endless_reader <- as_record_batch_reader(
     function() stop("this will error if called"),
     schema = schema()
   )
   
   head(endless_reader)
   #> RecordBatchReader
   ```
   
   <sup>Created on 2022-08-29 by the [reprex 
package](https://reprex.tidyverse.org) (v2.0.1)</sup>
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to