[
https://issues.apache.org/jira/browse/ARROW-16133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520690#comment-17520690
]
Martin du Toit commented on ARROW-16133:
----------------------------------------
Hi [~paleolimbot] , yes, Azure Blob storage isn't implemented in the R bindings
yet, well not that I'm aware of.
I will give your suggestion a go. Thanks
> [R][Python] Convert python dataset to R dataset
> -----------------------------------------------
>
> Key: ARROW-16133
> URL: https://issues.apache.org/jira/browse/ARROW-16133
> Project: Apache Arrow
> Issue Type: Wish
> Components: Python, R
> Reporter: Martin du Toit
> Priority: Major
>
> Hi.
> I can open an arrow dataset from R using reticulate, but I need to use that
> dataset further in R. How can I convert the Python arrow dataset to a R arrow
> dataset for further processing?
> {code:r}
> reticulate::py_discover_config()
> reticulate::py_available(initialize = TRUE)
> pd <- reticulate::import("pandas", convert = FALSE)
> adlfs <- reticulate::import("adlfs", convert = FALSE)
> pa <- reticulate::import("pyarrow", convert = FALSE)
> pyds <- reticulate::import("pyarrow.dataset", convert = FALSE)
> pafs <- reticulate::import("pyarrow.filesystem", convert = FALSE)
> dl_path =
> "investmentaccountingdata/rawdata/transactions/transactions-xxx/v1.1"
> format_name <- "transactions_transactions-xxx_v1.1"
> config <- get_config()
> datalake_secret <- config$get_datalake_secret()
> account_name <- datalake_secret$storname
> account_key <- datalake_secret$storkey
> dm_file_type <- dmfile_create_from_name(format_name = format_name)
> format_all <- dpl_arrow_format_get(dm_file_type)
> fs = adlfs$AzureBlobFileSystem(account_name=account_name,
> account_key=account_key)
> # Works as expected
> fs$ls("/")
> schema_file <- dpl_arrow_schema_get_dm(dm_file_type, all_char = TRUE, pyarrow
> = pa)
> ds <- pyds$dataset(source = dl_path, filesystem=fs, partitioning="hive",
> format="csv", schema = schema_file)
> # This works as expected
> files <- ds$files
> files <- py_to_r(files)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)