[
https://issues.apache.org/jira/browse/ARROW-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118091#comment-17118091
]
Mark Waddle commented on ARROW-8967:
------------------------------------
thanks [~wesm]. i think you're right that this is just not supported in pandas
due to the ns unit.
{noformat}
>>> pd.to_datetime(pd.DataFrame([[-61552915200000]], columns=['ts'])['ts'],
>>> unit='ms')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pandas/core/tools/datetimes.py",
line 728, in to_datetime
values = convert_listlike(arg._values, format)
File
"/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pandas/core/tools/datetimes.py",
line 332, in _convert_listlike_datetimes
arg, mask, unit, errors=errors
File "pandas/_libs/tslib.pyx", line 377, in
pandas._libs.tslib.array_with_unit_to_datetime
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: cannot convert input with
unit 'ms'
{noformat}
closing
> [Python] [Parquet] pyarrow.Table.to_pandas() fails to convert valid
> TIMESTAMP_MILLIS to pandas timestamp
> --------------------------------------------------------------------------------------------------------
>
> Key: ARROW-8967
> URL: https://issues.apache.org/jira/browse/ARROW-8967
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.17.0
> Reporter: Mark Waddle
> Priority: Major
>
> converting a table to pandas with a valid millis timestamp value of
> -61552915200000 (0019-06-20) results in the following error
> {noformat}
> File "pyarrow/array.pxi", line 587, in
> pyarrow.lib._PandasConvertible.to_pandas
> File "pyarrow/table.pxi", line 1640, in pyarrow.lib.Table._to_pandas
> File
> "/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
> line 766, in table_to_blockmanager
> blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
> File
> "/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
> line 1102, in _table_to_blocks
> list(extension_columns.keys()))
> File "pyarrow/table.pxi", line 1107, in pyarrow.lib.table_to_blocks
> File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Casting from timestamp[ms] to timestamp[ns] would
> result in out of bounds timestamp: -61552915200000
> {noformat}
> as it stands pyarrow cannot convert this parquet file to pandas
> i would like to be able to choose the timestamp unit when converting to
> pandas, much like you can when writing to parquet file.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)