[jira] [Comment Edited] (ARROW-5450) [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too large to convert to C long

Tim Swast (JIRA) Wed, 05 Jun 2019 14:25:35 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857049#comment-16857049
 ]


Tim Swast edited comment on ARROW-5450 at 6/5/19 9:24 PM:
----------------------------------------------------------

Since datetime.datetime objects don't support nanosecond precision, pandas 
Timestamp is a good default with nanosecond precision columns. But with 
microsecond precision columns, I'd always prefer a datetime.datetime object.


was (Author: tswast):
Since datetime.datetime objects don't support nanosecond precision, pandas 
Timestamp is a good default with nanosecond precision columns. But with 
microsecond precision objects, I'd always prefer a datetime.datetime object.

> [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too 
> large to convert to C long
> -------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-5450
>                 URL: https://issues.apache.org/jira/browse/ARROW-5450
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Tim Swast
>            Priority: Major
>
> When I attempt to roundtrip from a list of moderately large (beyond what can 
> be represented in nanosecond precision, but within microsecond precision) 
> datetime objects to pyarrow and back, I get an OverflowError: Python int too 
> large to convert to C long.
> pyarrow version:
> {noformat}
> $ pip freeze | grep pyarrow
> pyarrow==0.13.0{noformat}
>  
> Reproduction:
> {code:java}
> import datetime
> import pandas
> import pyarrow
> import pytz
> timestamp_rows = [
> datetime.datetime(1, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
> None,
> datetime.datetime(9999, 12, 31, 23, 59, 59, 999999, tzinfo=pytz.utc),
> datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
> ]
> timestamp_array = pyarrow.array(timestamp_rows, pyarrow.timestamp("us", 
> tz="UTC"))
> timestamp_roundtrip = timestamp_array.to_pylist()
> # ---------------------------------------------------------------------------
> # OverflowError Traceback (most recent call last)
> # <ipython-input-25-4a798e917c20> in <module>
> # ----> 1 timestamp_roundtrip = timestamp_array.to_pylist()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
>  in __iter__()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib.TimestampValue.as_py()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib._datetime_conversion_functions.lambda5()
> #
> # pandas/_libs/tslibs/timestamps.pyx in 
> pandas._libs.tslibs.timestamps.Timestamp.__new__()
> #
> # pandas/_libs/tslibs/conversion.pyx in 
> pandas._libs.tslibs.conversion.convert_to_tsobject()
> #
> # OverflowError: Python int too large to convert to C long
> {code}
> For good measure, I also tested with timezone-naive timestamps with the same 
> error:
> {code:java}
> naive_rows = [
> datetime.datetime(1, 1, 1, 0, 0, 0),
> None,
> datetime.datetime(9999, 12, 31, 23, 59, 59, 999999),
> datetime.datetime(1970, 1, 1, 0, 0, 0),
> ]
> naive_array = pyarrow.array(naive_rows, pyarrow.timestamp("us", tz=None))
> naive_roundtrip = naive_array.to_pylist()
> # ---------------------------------------------------------------------------
> # OverflowError Traceback (most recent call last)
> # <ipython-input-27-0c32e563d44a> in <module>
> # ----> 1 naive_roundtrip = naive_array.to_pylist()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
>  in __iter__()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib.TimestampValue.as_py()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib._datetime_conversion_functions.lambda5()
> #
> # pandas/_libs/tslibs/timestamps.pyx in 
> pandas._libs.tslibs.timestamps.Timestamp.__new__()
> #
> # pandas/_libs/tslibs/conversion.pyx in 
> pandas._libs.tslibs.conversion.convert_to_tsobject()
> #
> # OverflowError: Python int too large to convert to C long
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (ARROW-5450) [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too large to convert to C long

Reply via email to