[ 
https://issues.apache.org/jira/browse/ARROW-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2711:
--------------------------------
    Fix Version/s: 0.10.0

> [Python/C++] Pandas-Arrow doesn't roundtrip when column of lists has empty 
> first element
> ----------------------------------------------------------------------------------------
>
>                 Key: ARROW-2711
>                 URL: https://issues.apache.org/jira/browse/ARROW-2711
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Thomas Buhrmann
>            Priority: Major
>             Fix For: 0.10.0
>
>
> Hi, I thought this had been fixed in the past, but this simple use case still 
> breaks:
>  
> {code:java}
> df = pd.DataFrame(dict(x=[[], ["a"]]))
> tbl = pyarrow.Table.from_pandas(df)
> print(tbl.schema)
> {code}
> results in a wrong inferred type of "list<item: null>":
>  
> {noformat}
> x: list<item: null>
>   child 0, item: null
> __index_level_0__: int64
> metadata
> --------
> {b'pandas': b'{"index_columns": ["__index_level_0__"], "column_indexes": 
> [{"na'
>             b'me": null, "field_name": null, "pandas_type": "unicode", 
> "numpy_'
>             b'type": "object", "metadata": {"encoding": "UTF-8"}}], 
> "columns":'
>             b' [{"name": "x", "field_name": "x", "pandas_type": 
> "list[empty]",'
>             b' "numpy_type": "object", "metadata": null}, {"name": null, 
> "fiel'
>             b'd_name": "__index_level_0__", "pandas_type": "int64", 
> "numpy_typ'
>             b'e": "int64", "metadata": null}], "pandas_version": 
> "0.22.0"}'}{noformat}
> When converting the Table back to pandas all elements are now None too:
>  
> {code:java}
> df2 = tbl.to_pandas()
> print(df2)
>        x
> 0     [] 
> 1 [None]
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to