[ 
https://issues.apache.org/jira/browse/ARROW-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-3205:
------------------------------
    External issue URL: https://github.com/apache/arrow/issues/19550

> [R] Minimum working example round-tripping a data frame from R to plasma to 
> pandas
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-3205
>                 URL: https://issues.apache.org/jira/browse/ARROW-3205
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>            Reporter: James Lamb
>            Priority: Minor
>
> I see tremendous opportunity for interoperability between Python and R (two 
> popular languages for data scientists) using Arrow as an interchange format.
> To make this concrete and get developers in those languages interested, I 
> think it would be valuable to create a minimum working example of writing an 
> R data frame into plasma and reading it back up into *pandas* in a separate 
> Python process, and vice versa.
> I could, for example, envision reading a CSV up into a *data.table* in R to 
> do some cleaning and feature engineering, writing that object to *plasma*, 
> then kicking off multiple parallel Python processes to search a space of 
> models. This could demonstrate the benefits of replacing "load this dataset 
> from a file 50 times" with "read off this range of memory in plasma".
>  
> I believe pretty strongly that a tangible example like this would 
> meaningfully improve the R community's interest in and engagement with the 
> Arrow project.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to