Albert Shieh created ARROW-2205:
-----------------------------------

             Summary: Option for integer object nulls
                 Key: ARROW-2205
                 URL: https://issues.apache.org/jira/browse/ARROW-2205
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++, Python
            Reporter: Albert Shieh


I have a use case where the loss of precision in casting integers to floats 
matters, and pandas supports storing integers with nulls without loss of 
precision in object columns. However, a roundtrip through arrow will cast the 
object columns to float columns, even though the object columns are stored in 
arrow as integers with nulls.

This is a minimal example demonstrating the behavior of a roundtrip:
{code}
import numpy as np
import pandas as pd
import pyarrow as pa

df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
df_pa = pa.Table.from_pandas(df).to_pandas()

print(df)
print(df_pa)
{code}
The output is:
{code}
      a
0  None
1     1
     a
0  NaN
1  1.0
{code}
This seems to be the desired behavior, given test_int_object_nulls in 
test_convert_pandas.

I think it would be useful to add an option in the to_pandas methods to allow 
integers with nulls to be returned as object columns. The option can default to 
false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to