Wes McKinney created ARROW-7952:
-----------------------------------
Summary: [C++][Parquet] Error when failing to read original Arrow
schema from Parquet metadata
Key: ARROW-7952
URL: https://issues.apache.org/jira/browse/ARROW-7952
Project: Apache Arrow
Issue Type: Bug
Components: C++, Python
Reporter: Wes McKinney
I experienced the following failure
{code}
~/code/arrow/python/pyarrow/_parquet.pyx in
pyarrow._parquet.ParquetReader.open()
~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowInvalid: Tried reading schema message, was null or length 0
In ../src/parquet/arrow/reader_internal.cc, line 596, code:
::arrow::ipc::ReadSchema(&input, &dict_memo, out)
In ../src/parquet/arrow/reader_internal.cc, line 672, code:
GetOriginSchema(metadata, &manifest->schema_metadata, &manifest->origin_schema)
{code}
when reading the following file
https://github.com/wesm/vldb-2019-apache-arrow-workshop/raw/1e9cf24bd6b8ae03e419e15ebc78b2e8135b8e7a/fec-2012.parquet
I don't know whether this file is malformed (it was generated from a
development version of Arrow), so this may not actually be a problem, but this
mode of failure was unexpected and so I would like to understand why it happened
--
This message was sent by Atlassian Jira
(v8.3.4#803005)