Csaba Ringhofer created IMPALA-13261:
----------------------------------------
Summary: Consider the effect of NULL keys when choosing BROADCAST
vs SHUFFLE join
Key: IMPALA-13261
URL: https://issues.apache.org/jira/browse/IMPALA-13261
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: Csaba Ringhofer
Currently NULL keys are hashed to a single value and sent to a single fragment
instance in partitioned joins. This can cause data skew if the number of NULL
keys is large.
The planner could give preference to BROADCAST in LEFT OUTER JOIN when the
number of NULLs is large on the probe side.
Another potential solution for the same problem is IMPALA-13260 - it is about
sending rows with NULL keys to local fragment instances in this situation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]