jorisvandenbossche commented on issue #37291: URL: https://github.com/apache/arrow/issues/37291#issuecomment-1687692800
Thanks for the report @mroeschke. Looking into it, this _might_ actually be a bug in pandas. In all cases, we currently interpret this pd.Timedelta object as a datetime.timedelta in the type inference (so we currently actually ignore the nano part if you don't manually specify that as the data type; this is a known limitation, eg https://github.com/apache/arrow/issues/18099 and https://github.com/apache/arrow/issues/36035 for the equivalent Timestamp case). But when putting a breakpoint in our conversion utility, it seems that when I pass a nanosecond-resolution Timedelta we properly get the number of days, but when it is a new seconds-resolution Timedelta, those attributes are not set: Running `arr = pa.array([pd.Timedelta(days=1)])`: ``` Thread 1 "python" hit Breakpoint 2, arrow::py::internal::PyDelta_to_s (pytimedelta=0x7fffd75d5120) at /home/joris/scipy/repos/arrow/python/pyarrow/src/arrow/python/datetime.h:149 149 return (PyDateTime_DELTA_GET_DAYS(pytimedelta) * 86400LL + (gdb) p pytimedelta->seconds $2 = 0 (gdb) p pytimedelta->microseconds $3 = 0 (gdb) p pytimedelta->days $4 = 1 # <---- properly set to 1 ``` Running `arr = pa.array([pd.Timedelta(days=1).as_unit('s')])`: ``` Thread 1 "python" hit Breakpoint 1, arrow::py::internal::PyDelta_to_s (pytimedelta=0x7fffd75d51c0) at /home/joris/scipy/repos/arrow/python/pyarrow/src/arrow/python/datetime.h:149 149 return (PyDateTime_DELTA_GET_DAYS(pytimedelta) * 86400LL + (gdb) p pytimedelta->seconds $1 = 0 (gdb) p pytimedelta->microseconds $2 = 0 (gdb) p pytimedelta->days $3 = 0 # <---- this is now 0 ! ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
