[
https://issues.apache.org/jira/browse/ARROW-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615296#comment-17615296
]
Neal Richardson commented on ARROW-17609:
-----------------------------------------
To clarify: it seems that the cost is in instantiating the R6 objects, not the
calls to C++ themselves. But memoizing, deferring, etc. in these cases would
save going to C++ to create a new R6 object.
> [R] Streamline some C++ calls
> -----------------------------
>
> Key: ARROW-17609
> URL: https://issues.apache.org/jira/browse/ARROW-17609
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Reporter: Neal Richardson
> Assignee: Neal Richardson
> Priority: Major
>
> When looking at profiling data of TPC-H queries on ARROW-17462, there was
> some added overhead (not a ton: tens of ms, but enough to trigger benchmark
> regressions on small data) from the extra expression type calculation. It's
> not a huge deal, but I saw a few places where we could avoid doing
> unnecessary work:
> * Memoize Expression$type calculation
> * Defer Expression$schema determination (calls UnifySchema on expression
> args' schemas)--most expressions don't ever need it (ARROW-13186)
> * Set Expression$scalar type at creation so we don't have to query it
> * Eliminate the .fields() R function and move logic into Schema
> constructor--it creates a bunch of Field R6 objects that immediately are
> dropped
--
This message was sent by Atlassian Jira
(v8.20.10#820010)