jorisvandenbossche commented on issue #35088:
URL: https://github.com/apache/arrow/issues/35088#issuecomment-1506526303

   @lukemanley thanks for the report. This is an interesting bug .. The 
difference between both arrays that appear to be the same, is that the actual 
data buffer is different, because of being created differently (but the data is 
being masked because they are null, and so the actual value "behind" that null 
shouldn't matter in theory). 
   "Viewing" the data buffer as an int64 array to see the values:
   
   ```
   In [20]: pa.Array.from_buffers(pa.int64(), 1, [None, arr2.buffers()[1]])
   Out[20]: 
   <pyarrow.lib.Int64Array object at 0x7f4c1af64820>
   [
     0
   ]
   
   In [21]: pa.Array.from_buffers(pa.int64(), 1, [None, arr3.buffers()[1]])
   Out[21]: 
   <pyarrow.lib.Int64Array object at 0x7f4bf5998dc0>
   [
     -9223372036854775808
   ]
   ```
   
   And so my assumption is that the overflow comes from actually subtracting 
the values in the second case (`86400000000 - (-9223372036854775808)` would 
indeed overflow. 
   
   However, the way that the "substract_checked" is implemented, _should_ 
normally only do the actual substraction for data values that are not being 
masked as null, exactly to avoid situations like the above. But it seems there 
is a bug in this mechanism to skip values behind nulls.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to