[
https://issues.apache.org/jira/browse/ARROW-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney reassigned ARROW-2711:
-----------------------------------
Assignee: Wes McKinney
> [Python/C++] Pandas-Arrow doesn't roundtrip when column of lists has empty
> first element
> ----------------------------------------------------------------------------------------
>
> Key: ARROW-2711
> URL: https://issues.apache.org/jira/browse/ARROW-2711
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.9.0
> Reporter: Thomas Buhrmann
> Assignee: Wes McKinney
> Priority: Major
> Fix For: 0.10.0
>
>
> Hi, I thought this had been fixed in the past, but this simple use case still
> breaks:
>
> {code:java}
> df = pd.DataFrame(dict(x=[[], ["a"]]))
> tbl = pyarrow.Table.from_pandas(df)
> print(tbl.schema)
> {code}
> results in a wrong inferred type of "list<item: null>":
>
> {noformat}
> x: list<item: null>
> child 0, item: null
> __index_level_0__: int64
> metadata
> --------
> {b'pandas': b'{"index_columns": ["__index_level_0__"], "column_indexes":
> [{"na'
> b'me": null, "field_name": null, "pandas_type": "unicode",
> "numpy_'
> b'type": "object", "metadata": {"encoding": "UTF-8"}}],
> "columns":'
> b' [{"name": "x", "field_name": "x", "pandas_type":
> "list[empty]",'
> b' "numpy_type": "object", "metadata": null}, {"name": null,
> "fiel'
> b'd_name": "__index_level_0__", "pandas_type": "int64",
> "numpy_typ'
> b'e": "int64", "metadata": null}], "pandas_version":
> "0.22.0"}'}{noformat}
> When converting the Table back to pandas all elements are now None too:
>
> {code:java}
> df2 = tbl.to_pandas()
> print(df2)
> x
> 0 []
> 1 [None]
> {code}
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)