[
https://issues.apache.org/jira/browse/ARROW-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858547#comment-16858547
]
Joris Van den Bossche commented on ARROW-2298:
----------------------------------------------
[~farnoy] For me, the example you show above works:
{code}
In [33]: schema = pa.schema([)a.field(name='a', type=pa.int64(),
nullable=True)])
In [34]: pa.Table.from_pandas(df, schema=schema, preserve_index=False)
Out[34]:
pyarrow.Table
a: int64
metadata
--------
{b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [{"name":'
b' "a", "field_name": "a", "pandas_type": "int64", "numpy_type": "'
b'float64", "metadata": null}], "creator": {"library": "pyarrow", '
b'"version": "0.13.1.dev313+g997226a9"}, "pandas_version": "0.24.2'
b'"}'}
In [35]: table = _
In [36]: table.column('a')
Out[36]:
<Column name='a' type=DataType(int64)>
[
[
null,
1,
2,
3,
null
]
]
{code}
this is because in {{Table.from_pandas}} we assume data are coming from pandas
and allow the above.
Using just the array API, you can see that with (converting float numpy array
to integer arrow array):
{code:python}
In [41]: pa.array(np.array([1, 2, np.nan], dtype=float), type=pa.int64())
...
ArrowInvalid: Floating point value truncated
In [42]: pa.array(np.array([1, 2, np.nan], dtype=float), type=pa.int64(),
from_pandas=True)
Out[42]:
<pyarrow.lib.Int64Array object at 0x7feaeea36548>
[
1,
2,
null
]
{code}
Does that satisfy your use case?
It might not help with for very big integers that cannot be represented
properly as floats (that will still raise an error about values being
truncated), but I think if you are coming from pandas, that use case will not
be very frequent, exactly because pandas cannot properly represent that itself.
> [Python] Add option to not consider NaN to be null when converting to an
> integer Arrow type
> -------------------------------------------------------------------------------------------
>
> Key: ARROW-2298
> URL: https://issues.apache.org/jira/browse/ARROW-2298
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Wes McKinney
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.14.0
>
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> Follow-on work to ARROW-2135
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)