[ 
https://issues.apache.org/jira/browse/ARROW-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258929#comment-16258929
 ] 

DB Tsai commented on ARROW-1830:
--------------------------------

Those are parquet files generated by Spark written into Hive with S3 storage. 
Will be great to relax this constraint. Thanks.

> [Python] Error when loading all the files in a dictionary
> ---------------------------------------------------------
>
>                 Key: ARROW-1830
>                 URL: https://issues.apache.org/jira/browse/ARROW-1830
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.7.1
>         Environment: Python 2.7.11 (default, Jan 22 2016, 08:29:18)  + 
> pyarrow 0.7.1
>            Reporter: DB Tsai
>             Fix For: 0.8.0
>
>
> I can read one parquet file, but when I tried to read all the parquet files 
> in a folder, I got an error.
> {code:java}
> >>> data = 
> >>> pq.ParquetDataset('./aaa/part-00000-d8268e3a-4e65-41a3-a43e-01e0bf68ee86')
> >>> data = pq.ParquetDataset('./aaa/')
> Ignoring path: ./aaa//part-00000-d8268e3a-4e65-41a3-a43e-01e0bf68ee86
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 638, 
> in __init__
>     self.validate_schemas()
>   File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 647, 
> in validate_schemas
>     self.schema = self.pieces[0].get_metadata(open_file).schema
> IndexError: list index out of range
> >>> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to