[ 
https://issues.apache.org/jira/browse/ARROW-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118060#comment-17118060
 ] 

Wes McKinney commented on ARROW-8967:
-------------------------------------

I'm not sure if this is fixable, since pandas datetime64 data uses the 
nanosecond unit. [~jorisvandenbossche] do you know?

[~markwaddle] you can read this file fine into Arrow format, so it isn't true 
that "there is no way to read this file". You just can't convert out of bounds 
timestamps to pandas format at the moment. 

> [Python] [Parquet] pyarrow.Table.to_pandas() fails to convert valid 
> TIMESTAMP_MILLIS to pandas timestamp
> --------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-8967
>                 URL: https://issues.apache.org/jira/browse/ARROW-8967
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.17.0
>            Reporter: Mark Waddle
>            Priority: Major
>
> reading a parquet file with a valid TIMESTAMP_MILLIS value of -61552915200000 
> (0019-06-20) results in the following error
> {noformat}
> File "pyarrow/array.pxi", line 587, in 
> pyarrow.lib._PandasConvertible.to_pandas
>   File "pyarrow/table.pxi", line 1640, in pyarrow.lib.Table._to_pandas
>   File 
> "/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
>  line 766, in table_to_blockmanager
>     blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
>   File 
> "/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
>  line 1102, in _table_to_blocks
>     list(extension_columns.keys()))
>   File "pyarrow/table.pxi", line 1107, in pyarrow.lib.table_to_blocks
>   File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Casting from timestamp[ms] to timestamp[ns] would 
> result in out of bounds timestamp: -61552915200000
> {noformat}
> as it stands there is no way to read this file
> i would like to be able to choose the timestamp unit when reading, much like 
> you can when writing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to