Kevin Glasson created ARROW-7856:
------------------------------------
Summary: to_pandas() Causing datetimes > pd.Timestamp.max to wrap
around
Key: ARROW-7856
URL: https://issues.apache.org/jira/browse/ARROW-7856
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.15.1
Environment: Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
Python 3.7.3
In [3]: pa.__version__
Out[3]: '0.15.1'
In [4]: pd.__version__
Out[4]: '0.25.2'
Reporter: Kevin Glasson
When writing a dataframe containing `datetime.datetime` in an object columns
any datetime that is greater than pd.Timestamp.max or less than
pd.Timestamp.min is wrapped around.
For reference these are the timestamp min and max values.
{code:java}
In [43]: pd.Timestamp.max
Out[43]: Timestamp('2262-04-11 23:47:16.854775807')
In [44]: pd.Timestamp.min
Out[44]: Timestamp('1677-09-21 00:12:43.145225')
{code}
To reproduce the error using pandas
{code:java}
In [49]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]})
In [50]: df
Out[50]:
A
0 2262-04-12 00:00:00
In [51]: df.to_parquet("datetimething.parquet")
In [52]: pd.read_parquet("datetimething.parquet")
Out[52]:
A
0 1677-09-21 00:25:26.290448384
{code}
I have narrowed it down as far as to note that it is happening when converting
a `pa.Table` using the `to_pandas()` method.
{code:java}
In [30]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]})
In [31]: tf = pa.Table.from_pandas(df)
In [32]: tf.columns
Out[32]: [<pyarrow.lib.ChunkedArray object at 0x7f23884deef8>
[
[
2262-04-12 00:00:00.000000
]
]
]
In [33]: tf.to_pandas()
Out[33]: A
0 1677-09-21 00:25:26.290448384
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)