[
https://issues.apache.org/jira/browse/ARROW-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270998#comment-17270998
]
Laurent commented on ARROW-11120:
---------------------------------
For what is worth, calling the pyarrow table's `combine_chunks()` to cancel
chunking results in significant performance improvements. Conversion takes 75ms
instead of 24s after that.
Few comments about the API in relation with this:
- `pyarrow.lib.Table` has a method `combine_chunks()` but there does not seem
to be a way to "re-chunk" (say go from 2,200 chunks to 10 chunks)
- There is no apparent way to specify the number of chunk when creating the
table from a dataset using `to_table()`:
{code:python}
tbl = dataset.to_table(filter=ds.field('tip_amount') > 10)
{code}
The named argument `batch_size` does not appear to have any effect on the
number of chunks.
> [Python][R] Prove out plumbing to pass data between Python and R using rpy2
> ---------------------------------------------------------------------------
>
> Key: ARROW-11120
> URL: https://issues.apache.org/jira/browse/ARROW-11120
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python, R
> Reporter: Wes McKinney
> Priority: Major
>
> Per discussion on the mailing list, we should see what is required (if
> anything) to be able to pass data structures using the C interface between
> Python and R from the perspective of the Python user using rpy2. rpy2 is sort
> of the Python version of reticulate. Unit tests will then validate that it's
> working
--
This message was sent by Atlassian Jira
(v8.3.4#803005)