Stamatis Zampetakis created HIVE-29469:
------------------------------------------

             Summary: Some CBO optimizations are cancelled when using a CTE 
suggester
                 Key: HIVE-29469
                 URL: https://issues.apache.org/jira/browse/HIVE-29469
             Project: Hive
          Issue Type: Bug
          Components: CBO
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


The use of hive.optimize.cte.suggester.class property triggers the [CTE 
rewriting 
phase|https://github.com/apache/hive/blob/7060d94843fdbc548445db6aac84dd60b44641ee/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1731]
 that  is performed towards the end of CBO optimization.

Internally the CTE rewriting phase uses some logic from the MV rewriting and 
the latter reverts/cancels some optimizations (e.g., expansion of IN to OR via 
HiveInBetweenExpandRule) for allowing some replacements to happen. However, 
cancelling the already performed optimizations leads to non-optimal plans so 
ideally we should fine a way to restore those.

One instance of the problem can be seen from the EXPLAIN CBO plan of TPC-DS 
query23 and 64 after aggressively performing CTE rewriting by setting the 
properties below.
{code:sql}
set 
hive.optimize.cte.suggester.class=org.apache.hadoop.hive.ql.optimizer.calcite.CommonTableExpressionPrintSuggester;
set hive.optimize.cte.materialize.threshold=1;
set hive.optimize.cte.materialize.full.aggregate.only=false;
{code}
Below one snippet from the CBO plan of q23 with CTE rewrite enabled/disabled 
respectively.

+Plan with CTE rewrite disabled+
{noformat}
...
HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
  HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
...
{noformat}
+Plan with CTE rewrite enabled+
{noformat}
 
...
HiveFilter(condition=[OR(=($6, 1999), =($6, 2000), =($6, 2001), =($6, 2002))])
  HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
...
{noformat}
When CTE rewrite is enabled part of the plan is not optimal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to