Tim Swast created ARROW-5450:
--------------------------------
Summary: [Python] TimestampArray.to_pylist() fails with
OverflowError: Python int too large to convert to C long
Key: ARROW-5450
URL: https://issues.apache.org/jira/browse/ARROW-5450
Project: Apache Arrow
Issue Type: Bug
Reporter: Tim Swast
When I attempt to roundtrip from a list of moderately large (beyond what can be
represented in nanosecond precision, but within microsecond precision) datetime
objects to pyarrow and back, I get an OverflowError: Python int too large to
convert to C long.
pyarrow version:
{noformat}
$ pip freeze | grep pyarrow
pyarrow==0.13.0{noformat}
Reproduction:
{code:java}
import datetime
import pandas
import pyarrow
import pytz
timestamp_rows = [
datetime.datetime(1, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
None,
datetime.datetime(9999, 12, 31, 23, 59, 59, 999999, tzinfo=pytz.utc),
datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
]
timestamp_array = pyarrow.array(timestamp_rows, pyarrow.timestamp("us",
tz="UTC"))
timestamp_roundtrip = timestamp_array.to_pylist()
# ---------------------------------------------------------------------------
# OverflowError Traceback (most recent call last)
# <ipython-input-25-4a798e917c20> in <module>
# ----> 1 timestamp_roundtrip = timestamp_array.to_pylist()
#
#
~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
in __iter__()
#
#
~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
in pyarrow.lib.TimestampValue.as_py()
#
#
~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
in pyarrow.lib._datetime_conversion_functions.lambda5()
#
# pandas/_libs/tslibs/timestamps.pyx in
pandas._libs.tslibs.timestamps.Timestamp.__new__()
#
# pandas/_libs/tslibs/conversion.pyx in
pandas._libs.tslibs.conversion.convert_to_tsobject()
#
# OverflowError: Python int too large to convert to C long
{code}
For good measure, I also tested with timezone-naive timestamps with the same
error:
{code:java}
naive_rows = [
datetime.datetime(1, 1, 1, 0, 0, 0),
None,
datetime.datetime(9999, 12, 31, 23, 59, 59, 999999),
datetime.datetime(1970, 1, 1, 0, 0, 0),
]
naive_array = pyarrow.array(naive_rows, pyarrow.timestamp("us", tz=None))
naive_roundtrip = naive_array.to_pylist()
# ---------------------------------------------------------------------------
# OverflowError Traceback (most recent call last)
# <ipython-input-27-0c32e563d44a> in <module>
# ----> 1 naive_roundtrip = naive_array.to_pylist()
#
#
~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
in __iter__()
#
#
~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
in pyarrow.lib.TimestampValue.as_py()
#
#
~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
in pyarrow.lib._datetime_conversion_functions.lambda5()
#
# pandas/_libs/tslibs/timestamps.pyx in
pandas._libs.tslibs.timestamps.Timestamp.__new__()
#
# pandas/_libs/tslibs/conversion.pyx in
pandas._libs.tslibs.conversion.convert_to_tsobject()
#
# OverflowError: Python int too large to convert to C long
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)