[ 
https://issues.apache.org/jira/browse/ARROW-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148632#comment-16148632
 ] 

Brecht Machiels commented on ARROW-1429:
----------------------------------------

Thanks, Wes!

Looking at the list of open issues for 0.7.0, I assume the next release will 
not be here soon? It's just that I don't want to spend time building a patched 
package if 0.7.0 will be available next week.

> [Python] Error loading parquet file with _metadata from HDFS
> ------------------------------------------------------------
>
>                 Key: ARROW-1429
>                 URL: https://issues.apache.org/jira/browse/ARROW-1429
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.6.0
>         Environment: RHEL 6.8, Python 3.5.4 (Anaconda), Hadoop 2.6.0-cdh5.8.3
>            Reporter: Brecht Machiels
>            Assignee: Brecht Machiels
>             Fix For: 0.7.0
>
>
> I can open tables stored on HDFS as long as there is no _metadata file 
> besides the parquet files.
> For two tables with a _metadata file I get the following traceback:
> {code}
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
>   File "/home/bmachie/Documents/ml_irissearch/python/util.py", line 199, in 
> read_table
>     pq_table = read_hdfs_parquet(hdfs_path, columns)
>   File "/home/bmachie/Documents/ml_irissearch/python/util.py", line 251, in 
> read_hdfs_parquet
>     return HDFS_CONNECTION.read_parquet(hdfs_path, columns)
>   File 
> "/data/data01/dev/edl/infra/mstr/landing/condaenvs/ml_irissearch/lib/python3.5/site-packages/pyarrow/filesystem.py",
>  line 168, in read_parquet
>     filesystem=self)
>   File 
> "/data/data01/dev/edl/infra/mstr/landing/condaenvs/ml_irissearch/lib/python3.5/site-packages/pyarrow/parquet.py",
>  line 535, in __init__
>     self.common_metadata = ParquetFile(self.metadata_path).metadata
>   File 
> "/data/data01/dev/edl/infra/mstr/landing/condaenvs/ml_irissearch/lib/python3.5/site-packages/pyarrow/parquet.py",
>  line 54, in __init__
>     self.reader.open(source, metadata=metadata)
>   File "_parquet.pyx", line 398, in pyarrow._parquet.ParquetReader.open
>   File "io.pxi", line 705, in pyarrow.lib.get_reader
>   File "io.pxi", line 472, in pyarrow.lib.memory_map
>   File "io.pxi", line 451, in pyarrow.lib.MemoryMappedFile._open
>   File "error.pxi", line 72, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Failed to open local file: 
> hdfs://nameservice1/path/to/table/_metadata
> {code}
> For another table with a _metadata file:
> {code}
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
>   File "/home/bmachie/Documents/ml_irissearch/python/util.py", line 199, in 
> read_table
>     pq_table = read_hdfs_parquet(hdfs_path, columns)
>   File "/home/bmachie/Documents/ml_irissearch/python/util.py", line 251, in 
> read_hdfs_parquet
>     return HDFS_CONNECTION.read_parquet(hdfs_path, columns)
>   File 
> "/data/data01/dev/edl/infra/mstr/landing/condaenvs/ml_irissearch/lib/python3.5/site-packages/pyarrow/filesystem.py",
>  line 168, in read_parquet
>     filesystem=self)
>   File 
> "/data/data01/dev/edl/infra/mstr/landing/condaenvs/ml_irissearch/lib/python3.5/site-packages/pyarrow/parquet.py",
>  line 548, in __init__
>     self.validate_schemas()
>   File 
> "/data/data01/dev/edl/infra/mstr/landing/condaenvs/ml_irissearch/lib/python3.5/site-packages/pyarrow/parquet.py",
>  line 557, in validate_schemas
>     self.schema = self.pieces[0].get_metadata(open_file).schema
> IndexError: list index out of range
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to