[ https://issues.apache.org/jira/browse/ARROW-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209473#comment-16209473 ]
ASF GitHub Bot commented on ARROW-1213: --------------------------------------- Github user DrChrisLevy commented on the issue: https://github.com/apache/arrow/pull/916 Thanks @wesm ! I figured it out by looking through the commit changes. If anyone comes across this thread here is how you can read parquet files from an S3 directory using pyarrow. **Make sure you have the packages:** `pip install pyarrow` `pip install s3fs` **Python Code:** ``` import s3fs from pyarrow.filesystem import S3FSWrapper import pyarrow.parquet as pq access_key = <> # string with your aws_access_key_id secret_key = <> # string with your aws_secret_access_key fs = s3fs.S3FileSystem(key=access_key, secret=secret_key) # Suppose you had some parquet files stored in the # s3 path: s3://my_bucket/my_data/my_favorite_data bucket = 'my_bucket' path = 'my_data/my_favorite_data' bucket_uri = 's3://{bucket}/{path}'.format(**{'bucket':bucket, 'path': path}) dataset = pq.ParquetDataset(bucket_uri, filesystem=fs) table = dataset.read() df = table.to_pandas() ``` > [Python] Enable s3fs to be used with ParquetDataset and reader/writer > functions > ------------------------------------------------------------------------------- > > Key: ARROW-1213 > URL: https://issues.apache.org/jira/browse/ARROW-1213 > Project: Apache Arrow > Issue Type: Improvement > Reporter: Yacko > Assignee: Wes McKinney > Priority: Minor > Labels: pull-request-available > Fix For: 0.6.0 > > > Pyarrow dataset function can't read from s3 using s3fs as the filesystem. Is > there a way we can add the support for read from s3 based on partitioned > files ? > I am trying to address the problem mentioned in the stackoverflow link : > https://stackoverflow.com/questions/45082832/how-to-read-partitioned-parquet-files-from-s3-using-pyarrow-in-python -- This message was sent by Atlassian JIRA (v6.4.14#64029)