[
https://issues.apache.org/jira/browse/ARROW-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858724#comment-16858724
]
Joris Van den Bossche commented on ARROW-1989:
----------------------------------------------
Looking into this. But, I can't find a reproducible example which gives a
similar error to what is reported above. Does somebody have a concrete example?
With latest pandas and pyarrow (and the same with pd 0.24.2 / pyarrow 0.12), I
can get to something like this (having an timestamp with lower resolution that
is out of bounds for pandas):
{code:python}
In [63]: a = pa.array([datetime.datetime(1018, 12, 12)], type=pa.timestamp('s'))
In [64]: a.to_pandas()
Out[64]: array(['1018-12-12T00:00:00'], dtype='datetime64[s]')
In [65]: table = pa.Table.from_pydict({'a': a})
In [66]: table
Out[66]:
pyarrow.Table
a: timestamp[s]
In [67]: table.to_pandas()
Out[67]:
a
0 2188-01-19 23:09:07.419103232
{code}
This is a wrong result, however, and silently. This is a bug in pandas, and
described in https://issues.apache.org/jira/browse/ARROW-3176
> [Python] Better UX on timestamp conversion to Pandas
> ----------------------------------------------------
>
> Key: ARROW-1989
> URL: https://issues.apache.org/jira/browse/ARROW-1989
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Uwe L. Korn
> Priority: Major
> Fix For: 0.14.0
>
>
> Converting timestamp columns to Pandas, users often have the problem that
> they have dates that are larger than Pandas can represent with their
> nanosecond representation. Currently they simply see an Arrow exception and
> think that this problem is caused by Arrow. We should try to change the error
> from
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: XX
> {code}
> to something along the lines of
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data:
> XX. This conversion is needed as Pandas does only support nanosecond
> timestamps. Your data is likely out of the range that can be represented with
> nanosecond resolution.
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)