Csaba Ringhofer created IMPALA-13260:
----------------------------------------
Summary: Exchange on the probe side of outer joins could send NULL
keys to local target
Key: IMPALA-13260
URL: https://issues.apache.org/jira/browse/IMPALA-13260
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Csaba Ringhofer
Currently NULL keys are hashed to a single value and sent to a single fragment
instance in partitioned joins. This can cause data skew if the number of NULL
keys is large.
If a NULL key guarantees that no row is matched on the build side, then columns
from build side will be all NULL and it doesn't matter which fragment instance
processes the row.
Always sending rows with NULL key to a local fragment instance would both
reduce data skew and make the shuffle cheaper (no compression/network). If
mt_dop>0 then to completely avoid data these rows would need to be spread
evenly among the local fragment instances.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)