[ 
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730328#comment-17730328
 ] 

Rong Rong edited comment on CALCITE-5740 at 6/8/23 1:21 AM:
------------------------------------------------------------

i see, yeah that was a poorly chosen example. you are correct {{COUNT( * )}} 
results can be different if b.key is not unique. 

however, even just running the 2nd query through the planner:
{code:sql}
SELECT  
  col, COUNT(*)
FROM
  a
WHERE
  a.key IN (SELECT key FROM b WHERE val BETWEEN 0 AND 10)
{code}
with JOIN_TO_SEMI_JOIN rule will still result in an inner JOIN, the plan looks 
like this
{code:sql}
    LogicalAggregate(group=[{1}], EXPR$1=[COUNT()])            <-- P1
      LogicalJoin(condition=[=($0, $2)], joinType=[inner])     <-- P2
        LogicalProject(key=[$0], col=[$1])
          LogicalTableScan(table=[[a]])
        LogicalAggregate(group=[{0}])                          <-- P3
          LogicalProject(key=[$1], val=[$4])
            LogicalFilter(condition=[AND(>=($4, 0), <=($4, 10))])
              LogicalTableScan(table=[[b]]) {code}
e.g. it is still not generating a SEMI-JOIN. 

isn't the following conditions:
- *P1* not accessing any field from RHS table
- *P2* joining on {{$2}} which is RHS table {{key}} + *P3* distinct aggregate 
on {{key}}

necessary and sufficient to convert this INNER JOIN to SEMI-JOIN?

this might've been my own configuration issue, but is there:
 # some other rule I can use to configure the planner to generate SEMI-JOIN?
 # some default configuration that will directly generate a SEMI-JOIN when 
going through the SqlToRelConverter?

Thanks!


was (Author: rongr):
i see, yeah that was a poorly chosen example. you are correct {{COUNT( * )}} 
results can be different if b.key is not unique. 

however, even just running the 2nd query through the planner:
{code:sql}
SELECT  
  col, COUNT(*)
FROM
  a
WHERE
  a.key IN (SELECT key FROM b WHERE val BETWEEN 0 AND 10)
{code}
with JOIN_TO_SEMI_JOIN rule will still result in an inner JOIN, the plan looks 
like this
{code:sql}
    LogicalAggregate(group=[{1}], EXPR$1=[COUNT()])            <-- P1
      LogicalJoin(condition=[=($0, $2)], joinType=[inner])     <-- P2
        LogicalProject(key=[$0], col=[$1])
          LogicalTableScan(table=[[a]])
        LogicalAggregate(group=[{0}])                          <-- P3
          LogicalProject(key=[$1], val=[$4])
            LogicalFilter(condition=[AND(>=($4, 0), <=($4, 10))])
              LogicalTableScan(table=[[b]]) {code}
e.g. it is still not generating a SEMI-JOIN. 

isn't the following conditions:
- *P1* not accessing any field from RHS table
- *P2* joining on {{$2}} which is RHS table {{key}} + *P3* distinct aggregate 
on {{key}}
necessary and sufficient to convert this INNER JOIN to SEMI-JOIN?

this might've been my own configuration issue, but is there:
 # some other rule I can use to configure the planner to generate SEMI-JOIN?
 # some default configuration that will directly generate a SEMI-JOIN when 
going through the SqlToRelConverter?

Thanks!

> Support for AggToSemiJoinRule
> -----------------------------
>
>                 Key: CALCITE-5740
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5740
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Rong Rong
>            Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule.  which in 
> the rule itself it performance check and see if the project accesses columns 
> from the RHS result
> This can be extended to Aggregate as well, experimental code: 
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to 
> activate the project-to-semi-join rule. please share if there's any other 
> alternative if I haven't considered. 
> thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to