[jira] [Commented] (ARROW-16133) [R][Python] Convert python dataset to R dataset

Apache Arrow JIRA Bot (Jira) Fri, 30 Sep 2022 10:52:04 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-16133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611699#comment-17611699
 ]


Apache Arrow JIRA Bot commented on ARROW-16133:
-----------------------------------------------

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [R][Python] Convert python dataset to R dataset
> -----------------------------------------------
>
>                 Key: ARROW-16133
>                 URL: https://issues.apache.org/jira/browse/ARROW-16133
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: Python, R
>            Reporter: Martin du Toit
>            Assignee: Neal Richardson
>            Priority: Major
>
> Hi. 
> I can open an arrow dataset from R using reticulate, but I need to use that 
> dataset further in R. How can I convert the Python arrow dataset to a R arrow 
> dataset for further processing?
> {code:r}
> reticulate::py_discover_config()
> reticulate::py_available(initialize = TRUE)
> pd <- reticulate::import("pandas", convert = FALSE)
> adlfs <- reticulate::import("adlfs", convert = FALSE)
> pa <- reticulate::import("pyarrow", convert = FALSE)
> pyds <- reticulate::import("pyarrow.dataset", convert = FALSE)
> pafs <- reticulate::import("pyarrow.filesystem", convert = FALSE)
> dl_path = 
> "investmentaccountingdata/rawdata/transactions/transactions-xxx/v1.1"
> format_name <- "transactions_transactions-xxx_v1.1"
> config <- get_config()
> datalake_secret <- config$get_datalake_secret()
> account_name <- datalake_secret$storname
> account_key <- datalake_secret$storkey
> dm_file_type <- dmfile_create_from_name(format_name = format_name)
> format_all <- dpl_arrow_format_get(dm_file_type)
> fs = adlfs$AzureBlobFileSystem(account_name=account_name, 
> account_key=account_key)
> # Works as expected
> fs$ls("/")
> schema_file <- dpl_arrow_schema_get_dm(dm_file_type, all_char = TRUE, pyarrow 
> = pa)
> ds <- pyds$dataset(source = dl_path, filesystem=fs, partitioning="hive", 
> format="csv", schema = schema_file)
> # This works as expected
> files <- ds$files
> files <- py_to_r(files)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-16133) [R][Python] Convert python dataset to R dataset

Reply via email to