[ 
https://issues.apache.org/jira/browse/ARROW-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502369#comment-17502369
 ] 

Martin du Toit commented on ARROW-15856:
----------------------------------------

Hi [~paleolimbot] 

My problem seems to be the "minio gateway azure" that I'm running. If I 
replicate the folder structure locally, and using your local minio example 
(thanks), I can open the dataset with partitions as expected. This is the same 
behavior as if I do it for the local file system.

All our files is on Azure blob storage, so I unfortunately have to go that root.

Not sure what to do now.

> [R] S3FileSystem - open_dataset
> -------------------------------
>
>                 Key: ARROW-15856
>                 URL: https://issues.apache.org/jira/browse/ARROW-15856
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>    Affects Versions: 7.0.0
>            Reporter: Martin du Toit
>            Priority: Major
>
> Hi
>  I can successfully create a S3FileSystem that connects via minio. 
> I can create a SubTreeFileSystem: 
> s3://investmentaccountingdata/rawdata/transactions/transactions-xxx/v1.1/
> I can list the files in the SubTreeFileSystem, and I can open a dataset on 
> from the list of files
> {code:java}
> // code placeholder
> list_files <- sfs$ls(recursive=TRUE)
> ds <- arrow::open_dataset(sources = list_files, schema = schema_file, format 
> = csv_format, filesystem = sfs)
> {code}
> This all works fine, if I provide the list of files, but I want to specify a 
> path higher up to be able to include the sub folders as partitions. The code 
> I use works perfectly if I run it on a local disk.
> How can I do open_dataset, and give a folder as source?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to