[jira] [Commented] (ARROW-1213) [Python] Enable s3fs to be used with ParquetDataset and reader/writer functions

ASF GitHub Bot (JIRA) Wed, 18 Oct 2017 04:41:31 -0700

    [ 
https://issues.apache.org/jira/browse/ARROW-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209184#comment-16209184
 ]


ASF GitHub Bot commented on ARROW-1213:
---------------------------------------

Github user DrChrisLevy commented on the issue:

    https://github.com/apache/arrow/pull/916
  
    So if I have an ec2 instance or say an emr cluster on AWS, does this fix 
allow for reading a directory of multiple parquet files in S3 from pyarrow? I 
still can't seem to find an example of this fix in action. I can't seem to get 
pq.ParquetDataset("path to s3 directory") working. I have tried importing s3fs 
too. Is there an example of using this new feature in the docs? Cheers.  


> [Python] Enable s3fs to be used with ParquetDataset and reader/writer 
> functions
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-1213
>                 URL: https://issues.apache.org/jira/browse/ARROW-1213
>             Project: Apache Arrow
>          Issue Type: Improvement
>            Reporter: Yacko
>            Assignee: Wes McKinney
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 0.6.0
>
>
> Pyarrow dataset function can't read from s3 using s3fs as the filesystem. Is  
> there a way we can add the support for read from s3 based on partitioned 
> files ?
> I am trying to address the problem mentioned in the stackoverflow link :
> https://stackoverflow.com/questions/45082832/how-to-read-partitioned-parquet-files-from-s3-using-pyarrow-in-python



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ARROW-1213) [Python] Enable s3fs to be used with ParquetDataset and reader/writer functions

Reply via email to