Albert Shieh created ARROW-2205: ----------------------------------- Summary: Option for integer object nulls Key: ARROW-2205 URL: https://issues.apache.org/jira/browse/ARROW-2205 Project: Apache Arrow Issue Type: New Feature Components: C++, Python Reporter: Albert Shieh
I have a use case where the loss of precision in casting integers to floats matters, and pandas supports storing integers with nulls without loss of precision in object columns. However, a roundtrip through arrow will cast the object columns to float columns, even though the object columns are stored in arrow as integers with nulls. This is a minimal example demonstrating the behavior of a roundtrip: {code} import numpy as np import pandas as pd import pyarrow as pa df = pd.DataFrame({"a": np.array([None, 1], dtype=object)}) df_pa = pa.Table.from_pandas(df).to_pandas() print(df) print(df_pa) {code} The output is: {code} a 0 None 1 1 a 0 NaN 1 1.0 {code} This seems to be the desired behavior, given test_int_object_nulls in test_convert_pandas. I think it would be useful to add an option in the to_pandas methods to allow integers with nulls to be returned as object columns. The option can default to false in order to preserve the current behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005)