[jira] [Commented] (ARROW-9096) data type "integer" not understood: pandas roundtrip

Joris Van den Bossche (Jira) Thu, 11 Jun 2020 04:03:00 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-9096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133155#comment-17133155
 ]


Joris Van den Bossche commented on ARROW-9096:
----------------------------------------------

Thanks for the report. A smaller reproducer:

{code}
df = pd.DataFrame(np.random.randn(5, 1), columns=pd.Index([1], dtype=object))   
table = pa.Table.from_pandas(df)  
table.to_pandas() 
{code}

so what triggers this is to have an object-dtype index with integers as the 
column labels. We try to preserve the dtype of the column labels on roundtrip 
(that's why we store this in the pandas metadata), but this case is clearly not 
covered. 

Always welcome to take a look.

> data type "integer" not understood: pandas roundtrip
> ----------------------------------------------------
>
>                 Key: ARROW-9096
>                 URL: https://issues.apache.org/jira/browse/ARROW-9096
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.17.1
>            Reporter: Richard Wu
>            Priority: Minor
>
> The following will fail the roundtrip since the column indexes' pandas_type 
> is converted from int64 to integer when an additional column is introduced 
> and subsequently moved to the index:
>  
> {code:java}
> df = pd.DataFrame(np.ones((3,1), index=[[1,2,3]])
> df['foo'] = np.arange(3)
> df = df.set_index('foo', append=True)
> table = pyarrow.Table.from_pandas(df)
> table.to_pandas()  # Errors{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9096) data type "integer" not understood: pandas roundtrip

Reply via email to