llama90 commented on PR #38147: URL: https://github.com/apache/arrow/pull/38147#issuecomment-1753252441
@pitrou I thought you were pointing out a part in the code where a bug could occur due to implicit type conversion. So that's why I made the changes, and I didn't realize that there should be a discussion first when such reviews are given. I apologize for the confusion. Is it right to fundamentally ask why the code was changed? The initial issue raised was regarding incorrect return values of the Inner Join. Upon analyzing the code, it was found that during the execution of the `BuildBloomFilter_exec_task` function, incorrect offset calculations were made when calling the `HashBatch` function, leading to incorrect hash values being generated. `HashBatch` is responsible for copying ColumnArrays within the Key Batch using offset and length, and it calls the `Slice` function during this process. In the issue, a `large_utf8` type key column was being used, and the original code was set to always calculate the offset for such binary types as `uint32_t` size, which resulted in incorrect Inner Join outcomes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
