[
https://issues.apache.org/jira/browse/ARROW-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888163#comment-16888163
]
Thomas Buhrmann commented on ARROW-5379:
----------------------------------------
For the particular case of pd.Int64Dtype, the following may be a workaround for
now, in case that's useful to anybody. In short, cast pandas Int64 columns to
'object' before converting to Arrow. When converting back to pandas, import
with _integer_object_nulls=True_ and cast back to Int64. Seems to work
correctly for the below cases of pandas integer columns with or without NaNs,
and different integer sizes:
{code:java}
import pandas as pd
import pyarrow as pa
def from_pandas(df):
"""Cast Int64 to object before 'serializing'"""
for col in df:
if isinstance(df[col].dtype, pd.Int64Dtype):
df[col] = df[col].astype('object')
return pa.Table.from_pandas(df)
def to_pandas(tbl):
"""After 'deserializing', recover the correct int type"""
df = tbl.to_pandas(integer_object_nulls=True)
for col in df:
if (pa.types.is_integer(tbl.schema.field_by_name(col).type) and
pd.api.types.is_object_dtype(df[col].dtype)):
df[col] = df[col].astype('Int64')
return df
df = pd.Series([0, 1, None, 2, 822215679726100500], dtype='Int64',
name='x').to_frame()
# df = pd.Series([0, 1, 3, 2, 822215679726100500], dtype='Int64',
name='x').to_frame()
# df = pd.Series([0, 1, 3, 2, 15], dtype='Int64', name='x').to_frame()
# df = pd.Series([0, 1, 3, 2, 15], dtype='int16', name='x').to_frame()
df2 = to_pandas(from_pandas(df))
df2.dtypes
{code}
> [Python] support pandas' nullable Integer type in from_pandas
> -------------------------------------------------------------
>
> Key: ARROW-5379
> URL: https://issues.apache.org/jira/browse/ARROW-5379
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Priority: Major
>
> From https://github.com/apache/arrow/issues/4168. We should add support for
> pandas' nullable Integer extension dtypes, as those could map nicely to
> arrows integer types.
> Ideally this happens in a generic way though, and not specific for this
> extension type, which is discussed in ARROW-5271
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)