Uwe L. Korn created ARROW-2806: ---------------------------------- Summary: [Python] Inconsistent handling of np.nan Key: ARROW-2806 URL: https://issues.apache.org/jira/browse/ARROW-2806 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Reporter: Uwe L. Korn Fix For: 0.10.0
Currently we handle {{np.nan}} differently between having a list or a numpy array as an input to {{pa.array()}}: {code} >>> pa.array(np.array([1, np.nan])) <pyarrow.lib.DoubleArray object at 0x11680bea8> [ 1.0, nan ] >>> pa.array([1., np.nan]) Out[9]: <pyarrow.lib.DoubleArray object at 0x10bdacbd8> [ 1.0, NA ] {code} I would actually think the last one is the correct one. Especially once one casts this to an integer column. There the first one produces a column with INT_MIN and the second one produces a real null. But, in {{test_array_conversions_no_sentinel_values}} we check that {{np.nan}} does not produce a Null. Even weirder: {code} >>> df = pd.DataFrame({'a': [1., None]}) >>> df a 0 1.0 1 NaN >>> pa.Table.from_pandas(df).column(0) <Column name='a' type=DataType(double)> chunk 0: <pyarrow.lib.DoubleArray object at 0x104bbf958> [ 1.0, NA ] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)