Ben-Zvi opened a new pull request #1598: DRILL-6880: For Hash-Join hash-table 
build - treat null keys as an equal match 
URL: https://github.com/apache/drill/pull/1598
 
 
   
      When a new key is inserted into the hash-table's, a matching bucket is 
found (i.e., from its hash value modulo the table's size), but that bucket may 
contain an old key(s). A generated code method ("isKeyMatchInternalBuild") is 
used to compare and find if the two keys are identical.
   
   Previously a special check in that method addressed the case where (any pair 
of matching) columns where both null; returning "no match" if so.  In such a 
case, the hash-join would have entered the new key as a new entry at that 
bucket, leading to a growing useless chain that could end up in O(n^2) work (as 
each new key checks against all old ones).
   
     *The fix*: Generate a new method - "areBothKeysNull" - to perform that 
special check prior to the actual key match; used only for hash-join during 
build time.  Then this method returns the **inverse logic** - return a 
**"match"** when any two matching key-columns are both null.  This would avoid 
the useless chain in the hash-table. (A linked list would be created in the 
Hash-Join Helper, but that's an O(1) work that would be too complex/inefficient 
to avoid).
   
      Also note that the case of the keys like: <1,null>, <2,null>, <3,null>, 
<4,null>, ...... would not be addressed by the above change, as most of these 
keys would end up in different buckets, but that is still O(1) work, and no 
long chain.
   
     *The implementation*: Customize the internal method 
_setupIsKeyMatchInternal_ to also generate the new method _areBothKeysNull_ . 
The other methods it generates should not change from before, except 
_isKeyMatchInternalBuild_ where the special "both null" check is now eliminated.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to