Peter Goldsborough created ARROW-10955:
------------------------------------------
Summary: Cannot read empty json lists and write them as parquet
Key: ARROW-10955
URL: https://issues.apache.org/jira/browse/ARROW-10955
Project: Apache Arrow
Issue Type: Bug
Components: C++, Python
Affects Versions: 2.0.0, 1.0.0, 0.17.1, 0.17.0
Environment: linux and mac
Reporter: Peter Goldsborough
We're using Arrow to convert from JSON to Parquet and occasionally have empty
lists in our json. Reading such JSON into an Arrow table and writing it to
Parquet currently fails. We noticed this issue in our C++ Arrow code, but it
also happens from Python.
Minimal repro:
input.json:
{"foo": []}
convert.py:
import pyarrow.json
import pyarrow.parquet
t = pyarrow.json.read_json("input.json")
pyarrow.parquet.write_table(t, "out.parquet")
Produces:
Traceback (most recent call last):
File "repro.py", line 5, in <module>
pyarrow.parquet.write_table(t, "out.parquet")
File
"/Users/pgoldsborough/anduril/capacitor/env/lib/python3.8/site-packages/pyarrow/parquet.py",
line 1717, in write_table
with ParquetWriter(
File
"/Users/pgoldsborough/anduril/capacitor/env/lib/python3.8/site-packages/pyarrow/parquet.py",
line 554, in __init__
self.writer = _parquet.ParquetWriter(
File "pyarrow/_parquet.pyx", line 1409, in
pyarrow._parquet.ParquetWriter.__cinit__
File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: NullType Arrow field must be nullable
--
This message was sent by Atlassian Jira
(v8.3.4#803005)