[
https://issues.apache.org/jira/browse/ARROW-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858736#comment-16858736
]
Joris Van den Bossche commented on ARROW-1989:
----------------------------------------------
The mention of {{allow_truncated_timestamps=True}} led me to towards parquet,
and with that I can indeed reproduce it (although for the case I reproduced
below, it is about converting _from_ pandas and not _to_ pandas):
{code:python}
In [85]: df = pd.DataFrame({'a': [pd.Timestamp("2019-01-01
09:10:15.123456789")]})
In [86]: table = pa.Table.from_pandas(df)
In [88]: pq.write_table(table, '__test_datetime_highprecision.parquet')
...
ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data:
1546333815123456789
In [89]: pq.write_table(table, '__test_datetime_highprecision.parquet',
allow_truncated_timestamps=True)
In [91]: pq.read_table('__test_datetime_highprecision.parquet').to_pandas()
Out[91]:
a
0 2019-01-01 09:10:15.123456
{code}
So indeed, in this case it would be nice to have a better error message that
also points to this option.
However, for this specific case: shouldn't we be able to solve it now we have
NANOS support in Parquet writing? (see
https://issues.apache.org/jira/browse/ARROW-1957, which should be possible now
the LogicalTypes PR is merged: PARQUET-1411)
In general though, there will be other cases where it could be useful to
augment the arrow error message in Python.
> [Python] Better UX on timestamp conversion to Pandas
> ----------------------------------------------------
>
> Key: ARROW-1989
> URL: https://issues.apache.org/jira/browse/ARROW-1989
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Uwe L. Korn
> Priority: Major
> Fix For: 0.14.0
>
>
> Converting timestamp columns to Pandas, users often have the problem that
> they have dates that are larger than Pandas can represent with their
> nanosecond representation. Currently they simply see an Arrow exception and
> think that this problem is caused by Arrow. We should try to change the error
> from
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: XX
> {code}
> to something along the lines of
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data:
> XX. This conversion is needed as Pandas does only support nanosecond
> timestamps. Your data is likely out of the range that can be represented with
> nanosecond resolution.
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)