Karthik created ARROW-15142:
-------------------------------
Summary: Cannot mix struct and non-struct, non-null values error
when saving nested types with PyArrow
Key: ARROW-15142
URL: https://issues.apache.org/jira/browse/ARROW-15142
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 6.0.1
Reporter: Karthik
When trying to save a Pandas dataframe with a nested type (list within list,
list within dict) using pyarrow engine, the following error is encountered
{color:#e75c58}ArrowInvalid{color}: ('cannot mix list and non-list, non-null
values', 'Conversion failed for column A with type object')
Repro:
{code:java}
import pandas as pd
x = pd.DataFrame({"A": [[24, 27, [1, 1]]]})
x.to_parquet('/tmp/a.pqt', engine="pyarrow") {code}
Doing a bit of googling, it appears that this is a known Arrow shortcoming.
However, this is a commonly encountered datastructure, and 'fastparquet'
handles this seamlessly. Is there a proposed timeline/plan for fixing this?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)