[
https://issues.apache.org/jira/browse/ARROW-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270975#comment-17270975
]
Laurent commented on ARROW-11120:
---------------------------------
> 2200 chunks sounds really too much for 270K rows. I would expect at most 27
> chunks...
The data is split across several parquet files (time series, partitioned on
months). I just fetch a dataset in a example on the Ursa Labs website:
https://ursalabs.org/arrow-r-nightly/articles/dataset.html
Assuming that the default number of chunks is set per parquet file, this would
still be 10 times more chunk per file that you would expect.
> That said, 18 seconds to create 2200 objects is also completely unexpected.
That's 10 seconds, not 18. Otherwise I believe that we agree that this is not
terribly fast.
> [Python][R] Prove out plumbing to pass data between Python and R using rpy2
> ---------------------------------------------------------------------------
>
> Key: ARROW-11120
> URL: https://issues.apache.org/jira/browse/ARROW-11120
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python, R
> Reporter: Wes McKinney
> Priority: Major
>
> Per discussion on the mailing list, we should see what is required (if
> anything) to be able to pass data structures using the C interface between
> Python and R from the perspective of the Python user using rpy2. rpy2 is sort
> of the Python version of reticulate. Unit tests will then validate that it's
> working
--
This message was sent by Atlassian Jira
(v8.3.4#803005)