jorisvandenbossche commented on issue #35088:
URL: https://github.com/apache/arrow/issues/35088#issuecomment-1506526303
@lukemanley thanks for the report. This is an interesting bug .. The
difference between both arrays that appear to be the same, is that the actual
data buffer is different, because of being created differently (but the data is
being masked because they are null, and so the actual value "behind" that null
shouldn't matter in theory).
"Viewing" the data buffer as an int64 array to see the values:
```
In [20]: pa.Array.from_buffers(pa.int64(), 1, [None, arr2.buffers()[1]])
Out[20]:
<pyarrow.lib.Int64Array object at 0x7f4c1af64820>
[
0
]
In [21]: pa.Array.from_buffers(pa.int64(), 1, [None, arr3.buffers()[1]])
Out[21]:
<pyarrow.lib.Int64Array object at 0x7f4bf5998dc0>
[
-9223372036854775808
]
```
And so my assumption is that the overflow comes from actually subtracting
the values in the second case (`86400000000 - (-9223372036854775808)` would
indeed overflow.
However, the way that the "substract_checked" is implemented, _should_
normally only do the actual substraction for data values that are not being
masked as null, exactly to avoid situations like the above. But it seems there
is a bug in this mechanism to skip values behind nulls.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]