Rossil2012 opened a new issue, #9896:
URL: https://github.com/apache/arrow-datafusion/issues/9896

   In the `symmetric_hash_join` implementation, when a `RecordBatch` arrives, 
the probe side update the `(hash_value, indicies)` in its inner hashmap. The 
calling chain is `update_internal_state` -> `update_hash` -> `update_from_iter`.
   
   However, when there is already a hash key in the hashmap, it simply insert 
new indices into the list but do not check whether the key is actually equal, 
which makes a confilct possible.
   
   I'm wondering if you do notice the vulnerablility here, but ignore it 
because the hash conflict in u64 is rare, or there is safe guard elsewhere 
overlooked by me. Thanks for your time.
   
   
https://github.com/apache/arrow-datafusion/blob/cd7a00b08309f7229073e4bba686d6271726ab1c/datafusion/physical-plan/src/joins/symmetric_hash_join.rs#L1017-L1039
   
   
https://github.com/apache/arrow-datafusion/blob/cd7a00b08309f7229073e4bba686d6271726ab1c/datafusion/physical-plan/src/joins/hash_join.rs#L910-L948
   
   
https://github.com/apache/arrow-datafusion/blob/cd7a00b08309f7229073e4bba686d6271726ab1c/datafusion/physical-plan/src/joins/utils.rs#L203-L229


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to