Matthew Gilbert created ARROW-2135:

             Summary: from_pandas improperly casting NaNs
                 Key: ARROW-2135
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.8.0
            Reporter: Matthew Gilbert

If you create a {{Table}} from a {{DataFrame}} of ints with a NaN value the NaN 
is improperly cast. Since pandas casts these to floats, when converted to a 
table the NaN is interpreted as an integer. This seems like a bug since a known 
limitation in pandas (the inability to have null valued integers data) is 
taking precedence over arrow's functionality to store these as an IntArray with 

import pyarrow as pa
import pandas as pd

df = pd.DataFrame({"a":[1, 2,]})
schema = pa.schema([pa.field("a", pa.int64(), nullable=True)])
table = pa.Table.from_pandas(df, schema=schema)

<pyarrow.lib.Column object at 0x7f2151d19c90>
chunk 0: <pyarrow.lib.Int64Array object at 0x7f213bf356d8>

This message was sent by Atlassian JIRA

Reply via email to