paleolimbot commented on PR #13706: URL: https://github.com/apache/arrow/pull/13706#issuecomment-1230223775
A quick summary + reprex to augment the bit I wrote above...this PR (1) undoes the kludges I introduced when getting the user-defined function bits to work and not fail the valgrind check, (2) allows nested exec plans to with user-defined functions to work and (3) allows the result of an exec plan to be inspected (e.g., to print/walk its relation tree or calculate its schema). Reprex to play with: <details> ``` r library(arrow, warn.conflicts = FALSE) #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information. library(dplyr, warn.conflicts = FALSE) # The result of an ExecPlan is now a subclass of the RecordBatchReader # that more carefully manages the lifecycle of the underlying exec plan # (which includes not starting it until the first batch has been pulled # and releasing it as soon as it is no longer needed) result <- mtcars |> as_arrow_table() |> filter(mpg > 25) |> as_record_batch_reader() result #> ExecPlanReader #> <Status: PLAN_NOT_STARTED> #> #> mpg: double #> cyl: double #> disp: double #> hp: double #> drat: double #> wt: double #> qsec: double #> vs: double #> am: double #> gear: double #> carb: double #> #> See $metadata for additional Schema metadata #> #> See $Plan() for details. result$Plan() #> ExecPlan #> ExecPlan with 4 nodes: #> 3:SinkNode{} #> 2:ProjectNode{projection=[mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb]} #> 1:FilterNode{filter=(mpg > 25)} #> 0:TableSourceNode{} result$PlanStatus() #> [1] "PLAN_NOT_STARTED" as_arrow_table(result) #> Table #> 6 rows x 11 columns #> $mpg <double> #> $cyl <double> #> $disp <double> #> $hp <double> #> $drat <double> #> $wt <double> #> $qsec <double> #> $vs <double> #> $am <double> #> $gear <double> #> $carb <double> #> #> See $metadata for additional Schema metadata result$PlanStatus() #> [1] "PLAN_FINISHED" # head() on a record batch reader is now fully lazy (i.e., never # pull batches from its source until requested) endless_reader <- as_record_batch_reader( function() stop("this will error if called"), schema = schema() ) head(endless_reader) #> RecordBatchReader ``` <sup>Created on 2022-08-29 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup> </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org