[ https://issues.apache.org/jira/browse/ARROW-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16608447#comment-16608447 ]
Krisztian Szucs commented on ARROW-1989: ---------------------------------------- {code:python} In [45]: pa.array([datetime.date(2018, 12, 12)], type=pa.timestamp('s')) --------------------------------------------------------------------------- ArrowTypeError Traceback (most recent call last) <ipython-input-45-f6eb2418d6b7> in <module>() ----> 1 pa.array([datetime.date(2018, 12, 12)], type=pa.timestamp('s')) ~/Workspace/arrow/python/pyarrow/array.pxi in pyarrow.lib.array() 169 else: 170 # ConvertPySequence does strict conversion if type is explicitly passed --> 171 return _sequence_to_array(obj, mask, size, type, pool, from_pandas) 172 173 ~/Workspace/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array() 33 cdef shared_ptr[CChunkedArray] out 34 with nogil: ---> 35 check_status(ConvertPySequence(sequence, mask, options, &out)) 36 37 if out.get().num_chunks() == 1: ~/Workspace/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() 89 raise ArrowNotImplementedError(message) 90 elif status.IsTypeError(): ---> 91 raise ArrowTypeError(message) 92 elif status.IsCapacityError(): 93 raise ArrowCapacityError(message) ArrowTypeError: an integer is required (got type datetime.date) {code} however with datetime it works {code:python} In [46]: pa.array([datetime.datetime(2018, 12, 12)], type=pa.timestamp('s')) Out[46]: <pyarrow.lib.TimestampArray object at 0x11d243638> [ 1544572800 ] {code} I think We should have a general solution to extend the low level errors with extra, python related context. The current error handling in cython seems really lightweight https://github.com/apache/arrow/blob/master/python/pyarrow/error.pxi#L71 Would it be OK to extend it with an error rewriting logic? > [Python] Better UX on timestamp conversion to Pandas > ---------------------------------------------------- > > Key: ARROW-1989 > URL: https://issues.apache.org/jira/browse/ARROW-1989 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Reporter: Uwe L. Korn > Priority: Major > Fix For: 0.11.0 > > > Converting timestamp columns to Pandas, users often have the problem that > they have dates that are larger than Pandas can represent with their > nanosecond representation. Currently they simply see an Arrow exception and > think that this problem is caused by Arrow. We should try to change the error > from > {code} > ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: XX > {code} > to something along the lines of > {code} > ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: > XX. This conversion is needed as Pandas does only support nanosecond > timestamps. Your data is likely out of the range that can be represented with > nanosecond resolution. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)