[ 
https://issues.apache.org/jira/browse/ARROW-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270975#comment-17270975
 ] 

Laurent commented on ARROW-11120:
---------------------------------

> 2200 chunks sounds really too much for 270K rows. I would expect at most 27 
> chunks...

The data is split across several parquet files (time series, partitioned on 
months). I just fetch a dataset in a example on the Ursa Labs website: 
https://ursalabs.org/arrow-r-nightly/articles/dataset.html

Assuming that the default number of chunks is set per parquet file, this would 
still be 10 times more chunk per file that you would expect.

> That said, 18 seconds to create 2200 objects is also completely unexpected.

That's 10 seconds, not 18. Otherwise I believe that we agree that this is not 
terribly fast.

> [Python][R] Prove out plumbing to pass data between Python and R using rpy2
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-11120
>                 URL: https://issues.apache.org/jira/browse/ARROW-11120
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python, R
>            Reporter: Wes McKinney
>            Priority: Major
>
> Per discussion on the mailing list, we should see what is required (if 
> anything) to be able to pass data structures using the C interface between 
> Python and R from the perspective of the Python user using rpy2. rpy2 is sort 
> of the Python version of reticulate. Unit tests will then validate that it's 
> working



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to