[ 
https://issues.apache.org/jira/browse/ARROW-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888163#comment-16888163
 ] 

Thomas Buhrmann commented on ARROW-5379:
----------------------------------------

For the particular case of pd.Int64Dtype, the following may be a workaround for 
now, in case that's useful to anybody. In short, cast pandas Int64 columns to 
'object' before converting to Arrow. When converting back to pandas, import 
with _integer_object_nulls=True_ and cast back to Int64. Seems to work 
correctly for the below cases of pandas integer columns with or without NaNs, 
and different integer sizes:

 
{code:java}
import pandas as pd
import pyarrow as pa


def from_pandas(df):
    """Cast Int64 to object before 'serializing'"""
    for col in df:
        if isinstance(df[col].dtype, pd.Int64Dtype):
            df[col] = df[col].astype('object')
    return pa.Table.from_pandas(df)


def to_pandas(tbl):
    """After 'deserializing', recover the correct int type"""
    df = tbl.to_pandas(integer_object_nulls=True)

    for col in df:
        if (pa.types.is_integer(tbl.schema.field_by_name(col).type) and
            pd.api.types.is_object_dtype(df[col].dtype)):
                df[col] = df[col].astype('Int64')
    
    return df


df = pd.Series([0, 1, None, 2, 822215679726100500], dtype='Int64', 
name='x').to_frame()
# df = pd.Series([0, 1, 3, 2, 822215679726100500], dtype='Int64', 
name='x').to_frame()
# df = pd.Series([0, 1, 3, 2, 15], dtype='Int64', name='x').to_frame()
# df = pd.Series([0, 1, 3, 2, 15], dtype='int16', name='x').to_frame()

df2 = to_pandas(from_pandas(df))    
df2.dtypes
{code}
 

> [Python] support pandas' nullable Integer type in from_pandas
> -------------------------------------------------------------
>
>                 Key: ARROW-5379
>                 URL: https://issues.apache.org/jira/browse/ARROW-5379
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> From https://github.com/apache/arrow/issues/4168. We should add support for 
> pandas' nullable Integer extension dtypes, as those could map nicely to 
> arrows integer types. 
> Ideally this happens in a generic way though, and not specific for this 
> extension type, which is discussed in ARROW-5271



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to