walterddr commented on code in PR #11882:
URL: https://github.com/apache/pinot/pull/11882#discussion_r1373742525
##########
pinot-query-planner/src/test/resources/queries/JoinPlans.json:
##########
@@ -421,6 +421,28 @@
"\n LogicalTableScan(table=[[b]])",
"\n"
]
+ },
+ {
+ "description": "nexted reused tmp table SEMI JOINs",
+ "sql": "EXPLAIN PLAN FOR WITH tmp1 AS ( SELECT * FROM a WHERE col2 NOT
IN ('foo', 'bar') ), tmp2 AS ( SELECT * FROM b WHERE col1 IN (SELECT col1 FROM
tmp1) AND col3 < 100 ) SELECT * FROM tmp2 WHERE col1 IN (SELECT col1 from tmp1
WHERE col3 > 10)",
+ "output": [
+ "Execution Plan",
+ "\nLogicalJoin(condition=[=($0, $7)], joinType=[semi])",
+ "\n PinotLogicalExchange(distribution=[hash[0]])",
+ "\n LogicalJoin(condition=[=($0, $7)], joinType=[semi])",
+ "\n LogicalFilter(condition=[<($2, 100)])",
+ "\n LogicalTableScan(table=[[b]])",
+ "\n PinotLogicalExchange(distribution=[broadcast],
relExchangeType=[PIPELINE_BREAKER])",
+ "\n LogicalProject(col1=[$0], col2=[$1])",
+ "\n LogicalFilter(condition=[AND(<>($1, 'bar'), <>($1,
'foo'))])",
+ "\n LogicalTableScan(table=[[a]])",
+ "\n PinotLogicalExchange(distribution=[hash[0]])",
+ "\n LogicalProject(col1=[$0], col3=[$2])",
+ "\n LogicalFilter(condition=[>($2, 10)])",
Review Comment:
this is not unexpected. (we can potentially make it more smart). the problem
is that
```
WITH tmp1
AS (SELECT *
FROM a
WHERE col2 NOT IN ( 'foo', 'bar' )),
tmp2
AS (SELECT *
FROM b
WHERE col1 IN (SELECT col1
FROM tmp1)
AND col3 < 100)
SELECT *
FROM tmp2
WHERE col1 IN (SELECT col1
FROM tmp1 <-- this is the tmp1 modification into tmp1'
WHERE col3 > 10)
```
at this stage Calcite's doesn't know if you want to do a table spool of tmp1
and make tmp1' derived from tmp1 --> thus col3 > 10 is not pushed down.
only when we decided that table spool is not possible and then we decided to
copy tmp1 and tmp1' into 2 separate sub queries, at that time i think the hep
optimizer already get passed the filter merging phase.
rewriting the query in this way
```
WITH tmp1
AS (SELECT *
FROM a
WHERE col2 NOT IN ( 'foo', 'bar' )),
tmp2
AS (SELECT *
FROM b
WHERE col1 IN (SELECT col1
FROM tmp1)
AND col3 < 100),
tmp3 -- here we explicitly tell that tmp3 and tmp1 are not related
AS (SELECT *
FROM a
WHERE col2 NOT IN ( 'foo', 'bar' ) AND col3 > 10),
SELECT *
FROM tmp2
WHERE col1 IN (SELECT col1
FROM tmp3)
```
produces no multi-filter plan
##########
pinot-query-planner/src/test/resources/queries/JoinPlans.json:
##########
@@ -421,6 +421,28 @@
"\n LogicalTableScan(table=[[b]])",
"\n"
]
+ },
+ {
+ "description": "nexted reused tmp table SEMI JOINs",
+ "sql": "EXPLAIN PLAN FOR WITH tmp1 AS ( SELECT * FROM a WHERE col2 NOT
IN ('foo', 'bar') ), tmp2 AS ( SELECT * FROM b WHERE col1 IN (SELECT col1 FROM
tmp1) AND col3 < 100 ) SELECT * FROM tmp2 WHERE col1 IN (SELECT col1 from tmp1
WHERE col3 > 10)",
+ "output": [
+ "Execution Plan",
+ "\nLogicalJoin(condition=[=($0, $7)], joinType=[semi])",
+ "\n PinotLogicalExchange(distribution=[hash[0]])",
+ "\n LogicalJoin(condition=[=($0, $7)], joinType=[semi])",
+ "\n LogicalFilter(condition=[<($2, 100)])",
+ "\n LogicalTableScan(table=[[b]])",
+ "\n PinotLogicalExchange(distribution=[broadcast],
relExchangeType=[PIPELINE_BREAKER])",
+ "\n LogicalProject(col1=[$0], col2=[$1])",
+ "\n LogicalFilter(condition=[AND(<>($1, 'bar'), <>($1,
'foo'))])",
+ "\n LogicalTableScan(table=[[a]])",
+ "\n PinotLogicalExchange(distribution=[hash[0]])",
+ "\n LogicalProject(col1=[$0], col3=[$2])",
+ "\n LogicalFilter(condition=[>($2, 10)])",
Review Comment:
this is not unexpected. (we can potentially make it more smart). the problem
is that
```
WITH tmp1
AS (SELECT *
FROM a
WHERE col2 NOT IN ( 'foo', 'bar' )),
tmp2
AS (SELECT *
FROM b
WHERE col1 IN (SELECT col1
FROM tmp1)
AND col3 < 100)
SELECT *
FROM tmp2
WHERE col1 IN (SELECT col1
FROM tmp1 -- this is the tmp1 modification into tmp1'
WHERE col3 > 10)
```
at this stage Calcite's doesn't know if you want to do a table spool of tmp1
and make tmp1' derived from tmp1 --> thus col3 > 10 is not pushed down.
only when we decided that table spool is not possible and then we decided to
copy tmp1 and tmp1' into 2 separate sub queries, at that time i think the hep
optimizer already get passed the filter merging phase.
rewriting the query in this way
```
WITH tmp1
AS (SELECT *
FROM a
WHERE col2 NOT IN ( 'foo', 'bar' )),
tmp2
AS (SELECT *
FROM b
WHERE col1 IN (SELECT col1
FROM tmp1)
AND col3 < 100),
tmp3 -- here we explicitly tell that tmp3 and tmp1 are not related
AS (SELECT *
FROM a
WHERE col2 NOT IN ( 'foo', 'bar' ) AND col3 > 10),
SELECT *
FROM tmp2
WHERE col1 IN (SELECT col1
FROM tmp3)
```
produces no multi-filter plan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]