icexelloss commented on PR #36499: URL: https://github.com/apache/arrow/pull/36499#issuecomment-1627295592
> Now, taking this a step further, if the two batches have a different address, then things seem OK because the hashes would then be recomputed. I suspect the bug is triggered when the first and second batches happen to have the same address, which we know is a race condition, leading to the use of incorrect hashes. Granted, I haven't yet nailed down the exact triggering condition in the pre-PR code. It would take some effort - let me know if you'd like this investigated. For the time being, in the current version of the PR, I preferred to find the minimal change that works and I can explain. If that is case, then I would argue this is not a concurrency issue, but rather, that the KeyHasher class is broken and failed to cache hash correctly. I remembered we had this issue before and I thought one of your PR fixed this issue, but I cannot seem to find the change anymore. (There was one about Allocator reusing the buffer address for the second batch on MacOS or sth) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
