[jira] [Commented] (ARROW-1989) [Python] Better UX on timestamp conversion to Pandas

Krisztian Szucs (JIRA) Sun, 09 Sep 2018 06:26:56 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16608447#comment-16608447
 ]


Krisztian Szucs commented on ARROW-1989:
----------------------------------------

{code:python}
In [45]: pa.array([datetime.date(2018, 12, 12)], type=pa.timestamp('s'))
---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
<ipython-input-45-f6eb2418d6b7> in <module>()
----> 1 pa.array([datetime.date(2018, 12, 12)], type=pa.timestamp('s'))

~/Workspace/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
    169     else:
    170         # ConvertPySequence does strict conversion if type is 
explicitly passed
--> 171         return _sequence_to_array(obj, mask, size, type, pool, 
from_pandas)
    172
    173

~/Workspace/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()
     33     cdef shared_ptr[CChunkedArray] out
     34     with nogil:
---> 35         check_status(ConvertPySequence(sequence, mask, options, &out))
     36
     37     if out.get().num_chunks() == 1:

~/Workspace/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
     89             raise ArrowNotImplementedError(message)
     90         elif status.IsTypeError():
---> 91             raise ArrowTypeError(message)
     92         elif status.IsCapacityError():
     93             raise ArrowCapacityError(message)

ArrowTypeError: an integer is required (got type datetime.date)
 {code}

however with datetime it works

{code:python}
In [46]: pa.array([datetime.datetime(2018, 12, 12)], type=pa.timestamp('s'))
Out[46]:
<pyarrow.lib.TimestampArray object at 0x11d243638>
[
  1544572800
]
{code}

I think We should have a general solution to extend the low level errors with 
extra, python related context. 
The current error handling in cython seems really lightweight 
https://github.com/apache/arrow/blob/master/python/pyarrow/error.pxi#L71

Would it be OK to extend it with an error rewriting logic?

> [Python] Better UX on timestamp conversion to Pandas
> ----------------------------------------------------
>
>                 Key: ARROW-1989
>                 URL: https://issues.apache.org/jira/browse/ARROW-1989
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Uwe L. Korn
>            Priority: Major
>             Fix For: 0.11.0
>
>
> Converting timestamp columns to Pandas, users often have the problem that 
> they have dates that are larger than Pandas can represent with their 
> nanosecond representation. Currently they simply see an Arrow exception and 
> think that this problem is caused by Arrow. We should try to change the error 
> from
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: XX
> {code}
> to something along the lines of 
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 
> XX. This conversion is needed as Pandas does only support nanosecond 
> timestamps. Your data is likely out of the range that can be represented with 
> nanosecond resolution.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1989) [Python] Better UX on timestamp conversion to Pandas

Reply via email to