[jira] [Commented] (ARROW-1989) [Python] Better UX on timestamp conversion to Pandas

Joris Van den Bossche (JIRA) Fri, 07 Jun 2019 08:27:13 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858736#comment-16858736
 ]


Joris Van den Bossche commented on ARROW-1989:
----------------------------------------------

The mention of {{allow_truncated_timestamps=True}} led me to towards parquet, 
and with that I can indeed reproduce it (although for the case I reproduced 
below, it is about converting _from_ pandas  and not _to_ pandas):

{code:python}
In [85]: df = pd.DataFrame({'a': [pd.Timestamp("2019-01-01 
09:10:15.123456789")]})

In [86]: table = pa.Table.from_pandas(df)

In [88]: pq.write_table(table, '__test_datetime_highprecision.parquet')
...
ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 
1546333815123456789

In [89]: pq.write_table(table, '__test_datetime_highprecision.parquet', 
allow_truncated_timestamps=True)

In [91]: pq.read_table('__test_datetime_highprecision.parquet').to_pandas()
Out[91]: 
                           a
0 2019-01-01 09:10:15.123456
{code}

So indeed, in this case it would be nice to have a better error message that 
also points to this option.

However, for this specific case: shouldn't we be able to solve it now we have 
NANOS support in Parquet writing? (see 
https://issues.apache.org/jira/browse/ARROW-1957, which should be possible now 
the LogicalTypes PR is merged: PARQUET-1411)

In general though, there will be other cases where it could be useful to 
augment the arrow error message in Python. 

> [Python] Better UX on timestamp conversion to Pandas
> ----------------------------------------------------
>
>                 Key: ARROW-1989
>                 URL: https://issues.apache.org/jira/browse/ARROW-1989
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Uwe L. Korn
>            Priority: Major
>             Fix For: 0.14.0
>
>
> Converting timestamp columns to Pandas, users often have the problem that 
> they have dates that are larger than Pandas can represent with their 
> nanosecond representation. Currently they simply see an Arrow exception and 
> think that this problem is caused by Arrow. We should try to change the error 
> from
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: XX
> {code}
> to something along the lines of 
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 
> XX. This conversion is needed as Pandas does only support nanosecond 
> timestamps. Your data is likely out of the range that can be represented with 
> nanosecond resolution.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1989) [Python] Better UX on timestamp conversion to Pandas

Reply via email to