[ 
https://issues.apache.org/jira/browse/ARROW-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Waddle updated ARROW-8967:
-------------------------------
    Description: 
converting a table to pandas with a valid millis timestamp value of 
-61552915200000 (0019-06-20) results in the following error
{noformat}
File "pyarrow/array.pxi", line 587, in pyarrow.lib._PandasConvertible.to_pandas
  File "pyarrow/table.pxi", line 1640, in pyarrow.lib.Table._to_pandas
  File 
"/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
 line 766, in table_to_blockmanager
    blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
  File 
"/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
 line 1102, in _table_to_blocks
    list(extension_columns.keys()))
  File "pyarrow/table.pxi", line 1107, in pyarrow.lib.table_to_blocks
  File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Casting from timestamp[ms] to timestamp[ns] would 
result in out of bounds timestamp: -61552915200000
{noformat}

as it stands pyarrow cannot convert this parquet file to pandas

i would like to be able to choose the timestamp unit when converting to pandas, 
much like you can when writing to parquet file.

  was:
reading a parquet file with a valid TIMESTAMP_MILLIS value of -61552915200000 
(0019-06-20) results in the following error
{noformat}
File "pyarrow/array.pxi", line 587, in pyarrow.lib._PandasConvertible.to_pandas
  File "pyarrow/table.pxi", line 1640, in pyarrow.lib.Table._to_pandas
  File 
"/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
 line 766, in table_to_blockmanager
    blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
  File 
"/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
 line 1102, in _table_to_blocks
    list(extension_columns.keys()))
  File "pyarrow/table.pxi", line 1107, in pyarrow.lib.table_to_blocks
  File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Casting from timestamp[ms] to timestamp[ns] would 
result in out of bounds timestamp: -61552915200000
{noformat}

as it stands pyarrow cannot convert this parquet file to pandas

i would like to be able to choose the timestamp unit when reading, much like 
you can when writing.


> [Python] [Parquet] pyarrow.Table.to_pandas() fails to convert valid 
> TIMESTAMP_MILLIS to pandas timestamp
> --------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-8967
>                 URL: https://issues.apache.org/jira/browse/ARROW-8967
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.17.0
>            Reporter: Mark Waddle
>            Priority: Major
>
> converting a table to pandas with a valid millis timestamp value of 
> -61552915200000 (0019-06-20) results in the following error
> {noformat}
> File "pyarrow/array.pxi", line 587, in 
> pyarrow.lib._PandasConvertible.to_pandas
>   File "pyarrow/table.pxi", line 1640, in pyarrow.lib.Table._to_pandas
>   File 
> "/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
>  line 766, in table_to_blockmanager
>     blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
>   File 
> "/Users/mark/.local/share/virtualenvs/parquetpy-BNIqCtDj/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
>  line 1102, in _table_to_blocks
>     list(extension_columns.keys()))
>   File "pyarrow/table.pxi", line 1107, in pyarrow.lib.table_to_blocks
>   File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Casting from timestamp[ms] to timestamp[ns] would 
> result in out of bounds timestamp: -61552915200000
> {noformat}
> as it stands pyarrow cannot convert this parquet file to pandas
> i would like to be able to choose the timestamp unit when converting to 
> pandas, much like you can when writing to parquet file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to