[ https://issues.apache.org/jira/browse/ARROW-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258929#comment-16258929 ]
DB Tsai commented on ARROW-1830: -------------------------------- Those are parquet files generated by Spark written into Hive with S3 storage. Will be great to relax this constraint. Thanks. > [Python] Error when loading all the files in a dictionary > --------------------------------------------------------- > > Key: ARROW-1830 > URL: https://issues.apache.org/jira/browse/ARROW-1830 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.7.1 > Environment: Python 2.7.11 (default, Jan 22 2016, 08:29:18) + > pyarrow 0.7.1 > Reporter: DB Tsai > Fix For: 0.8.0 > > > I can read one parquet file, but when I tried to read all the parquet files > in a folder, I got an error. > {code:java} > >>> data = > >>> pq.ParquetDataset('./aaa/part-00000-d8268e3a-4e65-41a3-a43e-01e0bf68ee86') > >>> data = pq.ParquetDataset('./aaa/') > Ignoring path: ./aaa//part-00000-d8268e3a-4e65-41a3-a43e-01e0bf68ee86 > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 638, > in __init__ > self.validate_schemas() > File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 647, > in validate_schemas > self.schema = self.pieces[0].get_metadata(open_file).schema > IndexError: list index out of range > >>> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)