Csaba Ringhofer created IMPALA-13260:
----------------------------------------

             Summary: Exchange on the probe side of outer joins could send NULL 
keys to local target
                 Key: IMPALA-13260
                 URL: https://issues.apache.org/jira/browse/IMPALA-13260
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Csaba Ringhofer


Currently NULL keys are hashed to a single value and sent to a single fragment 
instance in partitioned joins. This can cause data skew if the number of NULL 
keys is large.

If a NULL key guarantees that no row is matched on the build side, then columns 
from build side will be all NULL and it doesn't matter which fragment instance 
processes the row.

Always sending rows with NULL key to a local fragment instance would both 
reduce data skew and make the shuffle cheaper (no compression/network). If 
mt_dop>0 then to completely avoid data these rows would need to be spread 
evenly among the local fragment instances. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to