Thomas Buhrmann created ARROW-2711: -------------------------------------- Summary: [Python/C++] Pandas-Arrow doesn't roundtrip when column of lists has empty first element Key: ARROW-2711 URL: https://issues.apache.org/jira/browse/ARROW-2711 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.9.0 Reporter: Thomas Buhrmann
Hi, I thought this had been fixed in the past, but this simple use case still breaks: {code:java} df = pd.DataFrame(dict(x=[[], ["a"]])) tbl = pyarrow.Table.from_pandas(df) print(tbl.schema) {code} results in a wrong inferred type of "list<item: null>": {noformat} x: list<item: null> child 0, item: null __index_level_0__: int64 metadata -------- {b'pandas': b'{"index_columns": ["__index_level_0__"], "column_indexes": [{"na' b'me": null, "field_name": null, "pandas_type": "unicode", "numpy_' b'type": "object", "metadata": {"encoding": "UTF-8"}}], "columns":' b' [{"name": "x", "field_name": "x", "pandas_type": "list[empty]",' b' "numpy_type": "object", "metadata": null}, {"name": null, "fiel' b'd_name": "__index_level_0__", "pandas_type": "int64", "numpy_typ' b'e": "int64", "metadata": null}], "pandas_version": "0.22.0"}'}{noformat} When converting the Table back to pandas all elements are now None too: {code:java} df2 = tbl.to_pandas() print(df2) x 0 [] 1 [None] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)