zanmato1984 commented on issue #41813: URL: https://github.com/apache/arrow/issues/41813#issuecomment-2168286330
The bug is that in this line: https://github.com/apache/arrow/blob/69e8a78c018da88b60f9eb2b3b45703f81f3c93d/cpp/src/arrow/compute/row/compare_internal_avx2.cc#L284 If a slot of `offset_right` contains a value `>= 0x80000000`, which is an offset in row bigger than `2GB`, then it is added to `right_base` as a negative integer, causing gathering data from an invalid address. Proval followed: Similar to @amoeba 's reproducing, mine is: ``` fault address: 0x4a7f85638 right_base: 0x0000000527e1e800 offset_right: (400023873834003288, 400025248223538328, 400025523922057392, -9217058400476779112) ``` Further decoding each slot of `offset_right`, it is: ``` (0x58D2B58 0x58D2B98 0x58D2C98 0x58D2CD8 0x3676B8B0 0x58D2D18 0x58D2D98 0x80166E38) ``` Note that the last offset is larger than `0x80000000`, and its signed interpretation is `-2146013640`. And `right_base(0x0000000527e1e800) + (-2146013640) = 0x4a7f85638` is exactly the offending address. I didn't calculate @amoeba 's case but I believe it has the same math. I'm working on a fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
