[ 
https://issues.apache.org/jira/browse/HIVE-27278?focusedWorklogId=858283&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858283
 ]

ASF GitHub Bot logged work on HIVE-27278:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Apr/23 16:30
            Start Date: 20/Apr/23 16:30
    Worklog Time Spent: 10m 
      Work Description: zabetak opened a new pull request, #4253:
URL: https://github.com/apache/hive/pull/4253

   ### What changes were proposed in this pull request?
   1. Add new pruning rules to remove correlate when inputs are empty. 
   2. Refactor pruning rules for join to remove code duplication with the 
addition of new rules.
   
   ### Why are the changes needed?
   Avoid redundant occurrences of `LogicalCorrelate` in the plan that cannot be 
handled by the compiler and may lead to exceptions and complation failures.
   
   ### Does this PR introduce _any_ user-facing change?
   Fixes some queries that fail at compile time.
   
   ```sql
   EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
= t2.id);
   ```
   **Before**
   ```
   HiveProject(id=[$0])
     LogicalCorrelate(correlation=[$cor0], joinType=[semi], 
requiredColumns=[{}])
       HiveTableScan(table=[[default, t1]], table:alias=[t1])
       HiveValues(tuples=[[]])
   ```
   **After**
   ```
   HiveValues(tuples=[[]])
   ```
   
   
   ### How was this patch tested?
   ```
   mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex=.*subquery.*
   mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex=empty.*
   ```




Issue Time Tracking
-------------------

            Worklog Id:     (was: 858283)
    Remaining Estimate: 0h
            Time Spent: 10m

> Simplify correlated queries with empty inputs
> ---------------------------------------------
>
>                 Key: HIVE-27278
>                 URL: https://issues.apache.org/jira/browse/HIVE-27278
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The correlated query below will not produce any result no matter the content 
> of the table.
> {code:sql}
> create table t1 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> create table t2 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> The CBO is able to derive that part of the query is empty and ends up with 
> the following plan.
> {noformat}
> CBO PLAN:
> HiveProject(id=[$0])
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}])
>     HiveTableScan(table=[[default, t1]], table:alias=[t1])
>     HiveValues(tuples=[[]])
> {noformat}
> The presence of LogicalCorrelate is first redundant but also problematic 
> since many parts of the optimizer assume that queries are decorrelated and do 
> not know how to handle the LogicalCorrelate.
> In the presence of views the same query can lead to the following exception 
> during compilation.
> {code:sql}
> CREATE MATERIALIZED VIEW v1 AS SELECT id FROM t2;
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> {noformat}
> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not 
> enough rules to produce a node with desired properties: convention=HIVE, 
> sort=[], dist=any. All the inputs have relevant nodes, however the cost is 
> still infinite.
> Root: rel#185:RelSubset#3.HIVE.[].any
> Original rel:
> HiveProject(id=[$0]): rowcount = 4.0, cumulative cost = {20.0 rows, 13.0 cpu, 
> 0.0 io}, id = 178
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], 
> requiredColumns=[{}]): rowcount = 4.0, cumulative cost = {16.0 rows, 9.0 cpu, 
> 0.0 io}, id = 176
>     HiveTableScan(table=[[default, t1]], table:alias=[t1]): rowcount = 4.0, 
> cumulative cost = {4.0 rows, 5.0 cpu, 0.0 io}, id = 111
>     HiveValues(tuples=[[]]): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 
> cpu, 0.0 io}, id = 139
> Sets:
> Set#0, type: RecordType(INTEGER id, VARCHAR(10) val, BIGINT 
> BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, 
> RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN 
> ROW__IS__DELETED)
>       rel#180:RelSubset#0.HIVE.[].any, best=rel#111
>               rel#111:HiveTableScan.HIVE.[].any(table=[default, 
> t1],htColumns=[0, 1, 2, 3, 4, 
> 5],insideView=false,plKey=default.t1;,table:alias=t1,tableScanTrait=null), 
> rowcount=4.0, cumulative cost={4.0 rows, 5.0 cpu, 0.0 io}
> Set#1, type: RecordType(NULL _o__c0)
>       rel#181:RelSubset#1.HIVE.[].any, best=rel#139
>               rel#139:HiveValues.HIVE.[].any(type=RecordType(NULL 
> _o__c0),tuples=[]), rowcount=1.0, cumulative cost={1.0 rows, 1.0 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER id, VARCHAR(10) val, BIGINT 
> BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, 
> RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN 
> ROW__IS__DELETED)
>       rel#183:RelSubset#2.NONE.[].any, best=null
>               
> rel#182:LogicalCorrelate.NONE.[].any(left=RelSubset#180,right=RelSubset#181,correlation=$cor0,joinType=semi,requiredColumns={}),
>  rowcount=4.0, cumulative cost={inf}
> Set#3, type: RecordType(INTEGER id)
>       rel#185:RelSubset#3.HIVE.[].any, best=null
>               
> rel#184:HiveProject.HIVE.[].any(input=RelSubset#183,inputs=0,synthetic=false),
>  rowcount=4.0, cumulative cost={inf}
> Graphviz:
> digraph G {
>       root [style=filled,label="Root"];
>       subgraph cluster0{
>               label="Set 0 RecordType(INTEGER id, VARCHAR(10) val, BIGINT 
> BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, 
> RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN 
> ROW__IS__DELETED)";
>               rel111 [label="rel#111:HiveTableScan\ntable=[default, 
> t1],htColumns=[0, 1, 2, 3, 4, 
> 5],insideView=false,plKey=default.t1;,table:alias=t1,tableScanTrait=null\nrows=4.0,
>  cost={4.0 rows, 5.0 cpu, 0.0 io}",color=blue,shape=box]
>               subset180 [label="rel#180:RelSubset#0.HIVE.[].any"]
>       }
>       subgraph cluster1{
>               label="Set 1 RecordType(NULL _o__c0)";
>               rel139 [label="rel#139:HiveValues\ntype=RecordType(NULL 
> _o__c0),tuples=[]\nrows=1.0, cost={1.0 rows, 1.0 cpu, 0.0 
> io}",color=blue,shape=box]
>               subset181 [label="rel#181:RelSubset#1.HIVE.[].any"]
>       }
>       subgraph cluster2{
>               label="Set 2 RecordType(INTEGER id, VARCHAR(10) val, BIGINT 
> BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, 
> RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN 
> ROW__IS__DELETED)";
>               rel182 
> [label="rel#182:LogicalCorrelate\nleft=RelSubset#180,right=RelSubset#181,correlation=$cor0,joinType=semi,requiredColumns={}\nrows=4.0,
>  cost={inf}",shape=box]
>               subset183 [label="rel#183:RelSubset#2.NONE.[].any"]
>       }
>       subgraph cluster3{
>               label="Set 3 RecordType(INTEGER id)";
>               rel184 
> [label="rel#184:HiveProject\ninput=RelSubset#183,inputs=0,synthetic=false\nrows=4.0,
>  cost={inf}",shape=box]
>               subset185 [label="rel#185:RelSubset#3.HIVE.[].any"]
>       }
>       root -> subset185;
>       subset180 -> rel111[color=blue];
>       subset181 -> rel139[color=blue];
>       subset183 -> rel182; rel182 -> subset180[label="0"]; rel182 -> 
> subset181[label="1"];
>       subset185 -> rel184; rel184 -> subset183;
> }
>       at 
> org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:742)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:365)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:520)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyMaterializedViewRewriting(CalcitePlanner.java:2058)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1722)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1591)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1343)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:570)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12820)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:326)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:326)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) 
> ~[hive-cli-4.0.0-SNAPSHOT.jar:?]
>       at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) 
> ~[hive-cli-4.0.0-SNAPSHOT.jar:?]
>       at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) 
> ~[hive-cli-4.0.0-SNAPSHOT.jar:?]
>       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) 
> ~[hive-cli-4.0.0-SNAPSHOT.jar:?]
>       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) 
> ~[hive-cli-4.0.0-SNAPSHOT.jar:?]
>       at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:733) 
> ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:703) 
> ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
>  ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) 
> ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>  ~[test-classes/:?]
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_261]
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_261]
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_261]
>       at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
>  ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 
> ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:413) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.Suite.runChild(Suite.java:128) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.Suite.runChild(Suite.java:27) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 
> ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:95)
>  ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>       at org.junit.rules.RunRules.evaluate(RunRules.java:20) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:413) 
> ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>  ~[surefire-junit4-3.0.0-M4.jar:3.0.0-M4]
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>  ~[surefire-junit4-3.0.0-M4.jar:3.0.0-M4]
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>  ~[surefire-junit4-3.0.0-M4.jar:3.0.0-M4]
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>  ~[surefire-junit4-3.0.0-M4.jar:3.0.0-M4]
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:377)
>  ~[surefire-booter-3.0.0-M4.jar:3.0.0-M4]
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:138) 
> ~[surefire-booter-3.0.0-M4.jar:3.0.0-M4]
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:465) 
> ~[surefire-booter-3.0.0-M4.jar:3.0.0-M4]
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:451) 
> ~[surefire-booter-3.0.0-M4.jar:3.0.0-M4]
> {noformat}
> The goal of this ticket is to get rid of the redundant correlation to avoid 
> compilation failures but also for unlocking further simplifications and 
> improving plan readability.
> The plan can be simplified further based on the following observations.
> If the right side of the correlate is empty then the whole correlate is empty 
> when joinType is SEMI/INNER. Moreover if correlate type is LEFT then we can 
> also drop the correlate and use t1 padded with nulls for the right side. 
> Lastly, if the type is ANTI then result is the entire t1 so the correlate can 
> also be dropped. RIGHT and FULL correlations are invalid and should never 
> appear in the plan.
> If the left side of the correlate is empty the result is empty and the 
> correlation can be dropped for every legal joinType (INNER/SEMI/ANTI/LEFT).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to