[ 
https://issues.apache.org/jira/browse/ARROW-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16870886#comment-16870886
 ] 

Joris Van den Bossche commented on ARROW-3176:
----------------------------------------------

I fixed the issue on the pandas side, meaning that you no longer get a 
incorrect date (eg the "1677-09-21 00:25:26.290448384" instead of 
"2262-04-12"), but an error: 

{code}
In [7]: pa.column('name', arr).to_pandas(date_as_object=False) 
...
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2262-04-12 00:00:00

In [8]: pa.column('name', arr).to_pandas(date_as_object=True) 
Out[8]: 
0    2262-04-12
Name: name, dtype: object
{code}

since pandas only supports datetime64[ns] at the moment, I think that is the 
best we can do: you get an error with dates that are out of bound for 
datetime64[ns], and in that case you should use the default 
{{date_as_object=True}}.

This will be fixed in pandas 0.25.0

For me, this issue can then be closed as we can rely on this fixed pandas 
behaviour.  
One alternative that could be done in pyarrow is to detect out of bound dates, 
and then always return objects instead of datetime64. But since that is already 
the default behaviour to always return objects, I personally don't think we 
should "ignore" the user-specified keyword {{date_as_object=False}} in those 
cases.

> [Python] Overflow in Date32 column conversion to pandas
> -------------------------------------------------------
>
>                 Key: ARROW-3176
>                 URL: https://issues.apache.org/jira/browse/ARROW-3176
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.10.0
>            Reporter: Florian Jetter
>            Priority: Minor
>             Fix For: 1.0.0
>
>
> When converting an arrow column holding a {{Date32Array}} to {{pandas}} there 
> seems to be an overflow at the date {{2262-04-12}} such that the type and 
> value are wrong. The issue only occurs for columns, not for arrays.
> Running on debian 9.5 w/ python2 gives
>   
> {code}
> In [1]: import numpy as np
> In [2]: import datetime
> In [3]: import pyarrow as pa
> In [4]: pa.__version__
> Out[4]: '0.10.0'
> In [5]: arr = pa.array(np.array([datetime.date(2262, 4, 12)], 
> dtype='datetime64[D]'))
> In [6]: arr.to_pandas(date_as_object=False)
> Out[6]: array(['2262-04-12'], dtype='datetime64[D]')
> In [7]: pa.column('name', arr).to_pandas(date_as_object=False)
> Out[7]:
> 0 1677-09-21 00:25:26.290448384
> Name: name, dtype: datetime64[ns]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to