Rossil2012 opened a new issue, #9896: URL: https://github.com/apache/arrow-datafusion/issues/9896
In the `symmetric_hash_join` implementation, when a `RecordBatch` arrives, the probe side update the `(hash_value, indicies)` in its inner hashmap. The calling chain is `update_internal_state` -> `update_hash` -> `update_from_iter`. However, when there is already a hash key in the hashmap, it simply insert new indices into the list but do not check whether the key is actually equal, which makes a confilct possible. I'm wondering if you do notice the vulnerablility here, but ignore it because the hash conflict in u64 is rare, or there is safe guard elsewhere overlooked by me. Thanks for your time. https://github.com/apache/arrow-datafusion/blob/cd7a00b08309f7229073e4bba686d6271726ab1c/datafusion/physical-plan/src/joins/symmetric_hash_join.rs#L1017-L1039 https://github.com/apache/arrow-datafusion/blob/cd7a00b08309f7229073e4bba686d6271726ab1c/datafusion/physical-plan/src/joins/hash_join.rs#L910-L948 https://github.com/apache/arrow-datafusion/blob/cd7a00b08309f7229073e4bba686d6271726ab1c/datafusion/physical-plan/src/joins/utils.rs#L203-L229 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
