[jira] [Commented] (ARROW-586) Problem with reading parquet files saved by Apache Spark

Wes McKinney (JIRA) Mon, 27 Feb 2017 05:05:32 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885731#comment-15885731
 ]


Wes McKinney commented on ARROW-586:
------------------------------------

Can you try also

{code}
import pyarrow.parquet as pq

path = 
'absolute/path/to/part-00000-2066b71b-c55f-411a-a682-4cc94ddb6d16.snappy.parquet'

pq.read_table(path)
{code}

Where "path" has the absolute path to the offending file. Any other detail 
about how you arrived at the error would be helpful (feel free to also attach 
the file if it is not large)

> Problem with reading parquet files saved by Apache Spark
> --------------------------------------------------------
>
>                 Key: ARROW-586
>                 URL: https://issues.apache.org/jira/browse/ARROW-586
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.1.0
>            Reporter: Adam Szałucha
>
> When I try to read parquet file saved by Apache Spark I get the following 
> error
> ---------------------------------------------------------------------------
> ArrowException                            Traceback (most recent call last)
> <ipython-input-14-6dfa089b9299> in <module>()
> ----> 1 table = pq.read_multiple_files(files2)
> /Users/adremja/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py in 
> read_multiple_files(paths, columns, filesystem, nthreads, metadata, schema)
>     141 
>     142     if metadata is None and schema is None:
> --> 143         schema = open_file(paths[0]).schema
>     144     elif schema is None:
>     145         schema = metadata.schema
> /Users/adremja/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py in 
> open_file(path, meta)
>     132     if filesystem is None:
>     133         def open_file(path, meta=None):
> --> 134             return ParquetFile(path, metadata=meta)
>     135     else:
>     136         def open_file(path, meta=None):
> /Users/adremja/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py in 
> __init__(self, source, metadata)
>      36     def __init__(self, source, metadata=None):
>      37         self.reader = ParquetReader()
> ---> 38         self.reader.open(source, metadata=metadata)
>      39 
>      40     @property
> /Users/adremja/miniconda3/lib/python3.6/site-packages/pyarrow/_parquet.pyx in 
> pyarrow._parquet.ParquetReader.open 
> (/Users/travis/miniconda3/conda-bld/recipe_1485750760150/work/arrow-7ac320bde52ae47007dadac7398e22a203c6a48d/python/build/temp.macosx-10.9-x86_64-3.6/_parquet.cxx:7144)()
> /Users/adremja/miniconda3/lib/python3.6/site-packages/pyarrow/io.pyx in 
> pyarrow.io.get_reader 
> (/Users/travis/miniconda3/conda-bld/recipe_1485750760150/work/arrow-7ac320bde52ae47007dadac7398e22a203c6a48d/python/build/temp.macosx-10.9-x86_64-3.6/io.cxx:9489)()
> /Users/adremja/miniconda3/lib/python3.6/site-packages/pyarrow/io.pyx in 
> pyarrow.io.MemoryMappedFile.__cinit__ 
> (/Users/travis/miniconda3/conda-bld/recipe_1485750760150/work/arrow-7ac320bde52ae47007dadac7398e22a203c6a48d/python/build/temp.macosx-10.9-x86_64-3.6/io.cxx:7732)()
> /Users/adremja/miniconda3/lib/python3.6/site-packages/pyarrow/error.pyx in 
> pyarrow.error.check_status 
> (/Users/travis/miniconda3/conda-bld/recipe_1485750760150/work/arrow-7ac320bde52ae47007dadac7398e22a203c6a48d/python/build/temp.macosx-10.9-x86_64-3.6/error.cxx:1197)()
> ArrowException: IOError: Failed to open file: 
> part-00000-2066b71b-c55f-411a-a682-4cc94ddb6d16.snappy.parquet



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-586) Problem with reading parquet files saved by Apache Spark

Reply via email to