Wes McKinney created ARROW-7952:
-----------------------------------

             Summary: [C++][Parquet] Error when failing to read original Arrow 
schema from Parquet metadata
                 Key: ARROW-7952
                 URL: https://issues.apache.org/jira/browse/ARROW-7952
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
            Reporter: Wes McKinney


I experienced the following failure

{code}
~/code/arrow/python/pyarrow/_parquet.pyx in 
pyarrow._parquet.ParquetReader.open()
~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Tried reading schema message, was null or length 0
In ../src/parquet/arrow/reader_internal.cc, line 596, code: 
::arrow::ipc::ReadSchema(&input, &dict_memo, out)
In ../src/parquet/arrow/reader_internal.cc, line 672, code: 
GetOriginSchema(metadata, &manifest->schema_metadata, &manifest->origin_schema)
{code}

when reading the following file

https://github.com/wesm/vldb-2019-apache-arrow-workshop/raw/1e9cf24bd6b8ae03e419e15ebc78b2e8135b8e7a/fec-2012.parquet

I don't know whether this file is malformed (it was generated from a 
development version of Arrow), so this may not actually be a problem, but this 
mode of failure was unexpected and so I would like to understand why it happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to