[
https://issues.apache.org/jira/browse/ARROW-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215958#comment-16215958
]
Bryan Cutler commented on ARROW-1680:
-------------------------------------
Thanks [~wesmckinn]. I'm also seeing another related issue with dates
{code}
import pandas as pd
import pyarrow as pa
import datetime
arr = pa.array([datetime.date(2017, 10, 23)])
c = pa.Column.from_array("d", arr)
s = c.to_pandas()
print(s)
# 0 2017-10-23
# Name: d, dtype: datetime64[ns]
result = pa.Array.from_pandas(s, type=pa.date32())
print(result)
"""
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221)
File
"/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py",
line 28, in array_format
values.append(value_format(x, 0))
File
"/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py",
line 49, in value_format
return repr(x)
File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535)
File "pyarrow/scalar.pxi", line 137, in pyarrow.lib.Date32Value.as_py
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:20368)
ValueError: year is out of range
"""
{code}
This is a little more troublesome because I can't find a decent workaround.
Should I open another jira for this?
> [Python] Timestamp unit change not done in from_pandas() conversion
> -------------------------------------------------------------------
>
> Key: ARROW-1680
> URL: https://issues.apache.org/jira/browse/ARROW-1680
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Bryan Cutler
> Assignee: Wes McKinney
> Fix For: 0.8.0
>
>
> When calling {{Array.from_pandas}} with a pandas.Series of timestamps that
> have 'ns' unit and specifying a type to coerce to with 'us' causes problems.
> When the series has timestamps with a timezone, the unit is ignored. When
> the series does not have a timezone, it is applied but causes an
> OverflowError when printing.
> {noformat}
> >>> import pandas as pd
> >>> import pyarrow as pa
> >>> from datetime import datetime
> >>> s = pd.Series([datetime.now()])
> >>> s_nyc = s.dt.tz_localize('tzlocal()').dt.tz_convert('America/New_York')
> >>> arr = pa.Array.from_pandas(s_nyc, type=pa.timestamp('us',
> >>> tz='America/New_York'))
> >>> arr.type
> TimestampType(timestamp[ns, tz=America/New_York])
> >>> arr = pa.Array.from_pandas(s, type=pa.timestamp('us'))
> >>> arr.type
> TimestampType(timestamp[us])
> >>> print(arr)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221)
> values = array_format(self, window=10)
> File "pyarrow/formatting.py", line 28, in array_format
> values.append(value_format(x, 0))
> File "pyarrow/formatting.py", line 49, in value_format
> return repr(x)
> File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535)
> return repr(self.as_py())
> File "pyarrow/scalar.pxi", line 240, in pyarrow.lib.TimestampValue.as_py
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:21600)
> return converter(value, tzinfo=tzinfo)
> File "pyarrow/scalar.pxi", line 204, in pyarrow.lib.lambda5
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:7295)
> TimeUnit_MICRO: lambda x, tzinfo: pd.Timestamp(
> File "pandas/_libs/tslib.pyx", line 402, in
> pandas._libs.tslib.Timestamp.__new__ (pandas/_libs/tslib.c:10051)
> File "pandas/_libs/tslib.pyx", line 1467, in
> pandas._libs.tslib.convert_to_tsobject (pandas/_libs/tslib.c:27665)
> OverflowError: Python int too large to convert to C long
> {noformat}
> A workaround is to manually change values with astype
> {noformat}
> >>> arr = pa.Array.from_pandas(s.values.astype('datetime64[us]'))
> >>> arr.type
> TimestampType(timestamp[us])
> >>> print(arr)
> <pyarrow.lib.TimestampArray object at 0x7f6a67e0a3c0>
> [
> Timestamp('2017-10-17 11:04:44.308233')
> ]
> >>>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)