[
https://issues.apache.org/jira/browse/ARROW-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209184#comment-16209184
]
ASF GitHub Bot commented on ARROW-1213:
---------------------------------------
Github user DrChrisLevy commented on the issue:
https://github.com/apache/arrow/pull/916
So if I have an ec2 instance or say an emr cluster on AWS, does this fix
allow for reading a directory of multiple parquet files in S3 from pyarrow? I
still can't seem to find an example of this fix in action. I can't seem to get
pq.ParquetDataset("path to s3 directory") working. I have tried importing s3fs
too. Is there an example of using this new feature in the docs? Cheers.
> [Python] Enable s3fs to be used with ParquetDataset and reader/writer
> functions
> -------------------------------------------------------------------------------
>
> Key: ARROW-1213
> URL: https://issues.apache.org/jira/browse/ARROW-1213
> Project: Apache Arrow
> Issue Type: Improvement
> Reporter: Yacko
> Assignee: Wes McKinney
> Priority: Minor
> Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Pyarrow dataset function can't read from s3 using s3fs as the filesystem. Is
> there a way we can add the support for read from s3 based on partitioned
> files ?
> I am trying to address the problem mentioned in the stackoverflow link :
> https://stackoverflow.com/questions/45082832/how-to-read-partitioned-parquet-files-from-s3-using-pyarrow-in-python
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)