[
https://issues.apache.org/jira/browse/ARROW-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270940#comment-17270940
]
Laurent commented on ARROW-11120:
---------------------------------
I looked briefly into it and the issue might be caused by a combination of what
the API in the R arrow package and R's performance when creating many R6
objects objects.
The R constructor for ChunkedArray expects a list of Array objects. In my
example a ChunkedArray has ~2200 chunks. Getting R to build that many dummy
Array objects (`arrow::Array$create(1)`) takes over half a second. If I
multiply this by 18 (number of columns in my tables) have slightly over 10
seconds (almost half of the 24 seconds observed).
It feels like a pair of functions `pyarrow.ChunkedArray._export_to_c() ` and
`arrow:::ImportChunkedArray()` would be needed.
> [Python][R] Prove out plumbing to pass data between Python and R using rpy2
> ---------------------------------------------------------------------------
>
> Key: ARROW-11120
> URL: https://issues.apache.org/jira/browse/ARROW-11120
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python, R
> Reporter: Wes McKinney
> Priority: Major
>
> Per discussion on the mailing list, we should see what is required (if
> anything) to be able to pass data structures using the C interface between
> Python and R from the perspective of the Python user using rpy2. rpy2 is sort
> of the Python version of reticulate. Unit tests will then validate that it's
> working
--
This message was sent by Atlassian Jira
(v8.3.4#803005)