[
https://issues.apache.org/jira/browse/ARROW-17192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621174#comment-17621174
]
Alenka Frim commented on ARROW-17192:
-------------------------------------
This is said to be a known issue due to the fact that pandas, for now, only
supports {{datetime64}} data type in nanosecond resolution. So when you write
to a feather file the pandas dataframe gets converted to an arrow table and the
conversion infers the datetime to microsecond resolution.
As a workaround you can use {{feather.read_table}} to read the feather file
into an Arrow table and then use {{to_pandas}} to convert it into a pandas
dataframe, but you will have to add {{timestamp_as_object=True}} keyword so
that PyArrow doesn't try to convert the timestamp to {{{}datetime64[ns]{}}}:
{code:python}
>>> feather.read_table("to_trash.feather").to_pandas(timestamp_as_object=True)
date
0 1654-01-01 00:00:00
1 1920-01-01 00:00:00
{code}
But I think we should still pass through {{**kwargs}} in {{read_feather}} to
{{to_pandas()}} so that one could specify {{timestamp_as_object=True}} keyword
there also. So I am keeping the Jira open and will try to make a PR for it in
the following week. Contributions are also welcome, I can help if needed.
> [Python] .to_pandas can't read_feather if a date column contains dates
> before 1677 and after 2262
> --------------------------------------------------------------------------------------------------
>
> Key: ARROW-17192
> URL: https://issues.apache.org/jira/browse/ARROW-17192
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Environment: Any environment
> Reporter: Adrien Pacifico
> Priority: Major
>
> A feather file with a column containing dates lower than 1677 or greater than
> 2262 cannot be read with pandas, du to `.to_pandas` method.
> To reproduce the issue:
> {code:java}
> ### create feather file
> import pandas as pd
> from datetime import datetime
> df = pd.DataFrame({"date": [
> datetime.fromisoformat("1654-01-01"),
> datetime.fromisoformat("1920-01-01"),
> ],})
> df.to_feather("to_trash.feather")
> ### read feather file
> from pyarrow.feather import read_feather
> read_feather("to_trash.feather")
> {code}
>
> I think that the expected behavior would be to have an object column
> contining datetime objects.
> I think that the problem comes from _array_like_to_pandas method :
> [https://github.com/apache/arrow/blob/76f45a6892b13391fdede4c72934f75f6d56143c/python/pyarrow/array.pxi#L1584]
> or from `_to_pandas()`
> [https://github.com/apache/arrow/blob/76f45a6892b13391fdede4c72934f75f6d56143c/python/pyarrow/array.pxi#L2742]
> or from `to_pandas`:
> [https://github.com/apache/arrow/blob/76f45a6892b13391fdede4c72934f75f6d56143c/python/pyarrow/array.pxi#L673]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)