Seonggon Namgung created HIVE-29159:
---------------------------------------

             Summary: Consider DPP optimization when computing the benefit of 
the SemiJoin branch
                 Key: HIVE-29159
                 URL: https://issues.apache.org/jira/browse/HIVE-29159
             Project: Hive
          Issue Type: Improvement
            Reporter: Seonggon Namgung
            Assignee: Seonggon Namgung


To minimize the amount of shuffled data, Hive uses dynamic partition pruning 
(DPP) and dynamic semijoin reduction (DSR). These optimization techniques are 
useful in most cases, but we often observe that DSR is less efficient than 
expected when a TableScan is affected by both DPP and DSR. This happens because 
the computation of the benefit of DSR does not take DPP into account, resulting 
in an overestimation of the number of rows from the TableScan.

This JIRA aims to improve the computation of the benefit of DSR by adjusting 
the statistics based on DPP branches. The current plan for this JIRA consists 
of three steps:
1. Adjust the statistics of TableScan operators targeted by DPP.
2. Propagate the updated statistics from the TableScan operators to their 
descendants.
3. Adjust the order of the DPP branch removal steps if needed, and implement a 
fallback mechanism in case query execution fails due to DPP failures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to