[ 
https://issues.apache.org/jira/browse/ARROW-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858724#comment-16858724
 ] 

Joris Van den Bossche commented on ARROW-1989:
----------------------------------------------

Looking into this. But, I can't find a reproducible example which gives a 
similar error to what is reported above. Does somebody have a concrete example?

With latest pandas and pyarrow (and the same with pd 0.24.2 / pyarrow 0.12), I 
can get to something like this (having an timestamp with lower resolution that 
is out of bounds for pandas):

{code:python}
In [63]: a = pa.array([datetime.datetime(1018, 12, 12)], type=pa.timestamp('s'))

In [64]: a.to_pandas()
Out[64]: array(['1018-12-12T00:00:00'], dtype='datetime64[s]')

In [65]: table = pa.Table.from_pydict({'a': a})

In [66]: table
Out[66]: 
pyarrow.Table
a: timestamp[s]

In [67]: table.to_pandas()
Out[67]: 
                              a
0 2188-01-19 23:09:07.419103232
{code}

This is a wrong result, however, and silently. This is a bug in pandas, and 
described in https://issues.apache.org/jira/browse/ARROW-3176

> [Python] Better UX on timestamp conversion to Pandas
> ----------------------------------------------------
>
>                 Key: ARROW-1989
>                 URL: https://issues.apache.org/jira/browse/ARROW-1989
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Uwe L. Korn
>            Priority: Major
>             Fix For: 0.14.0
>
>
> Converting timestamp columns to Pandas, users often have the problem that 
> they have dates that are larger than Pandas can represent with their 
> nanosecond representation. Currently they simply see an Arrow exception and 
> think that this problem is caused by Arrow. We should try to change the error 
> from
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: XX
> {code}
> to something along the lines of 
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 
> XX. This conversion is needed as Pandas does only support nanosecond 
> timestamps. Your data is likely out of the range that can be represented with 
> nanosecond resolution.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to