[ 
https://issues.apache.org/jira/browse/ARROW-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118379#comment-17118379
 ] 

Joris Van den Bossche commented on ARROW-8967:
----------------------------------------------

Yes, we have several issues about this already. 
It's a current limitation in pandas that it cannot represent those "out of 
bounds" timestamps because pandas only supports the nanosecond resolution.

However, we should still be able to convert to pandas but using datetime 
objects. See eg ARROW-5359, for which there is currently an open PR to add a 
keyword for this.

> [Python] [Parquet] pyarrow.Table.to_pandas() fails to convert valid 
> TIMESTAMP_MILLIS to pandas timestamp
> --------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-8967
>                 URL: https://issues.apache.org/jira/browse/ARROW-8967
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.17.0
>            Reporter: Mark Waddle
>            Priority: Major
>
> converting a table to pandas with a valid millis timestamp value of 
> -61552915200000 (0019-06-20) results in the following error
> {noformat}
> File "pyarrow/array.pxi", line 587, in 
> pyarrow.lib._PandasConvertible.to_pandas
>   File "pyarrow/table.pxi", line 1640, in pyarrow.lib.Table._to_pandas
>   File 
> "/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
>  line 766, in table_to_blockmanager
>     blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
>   File 
> "/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
>  line 1102, in _table_to_blocks
>     list(extension_columns.keys()))
>   File "pyarrow/table.pxi", line 1107, in pyarrow.lib.table_to_blocks
>   File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Casting from timestamp[ms] to timestamp[ns] would 
> result in out of bounds timestamp: -61552915200000
> {noformat}
> as it stands pyarrow cannot convert this parquet file to pandas
> i would like to be able to choose the timestamp unit when converting to 
> pandas, much like you can when writing to parquet file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to