[
https://issues.apache.org/jira/browse/IMPALA-8423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029030#comment-17029030
]
Tamas Mate commented on IMPALA-8423:
------------------------------------
Hi [qualong],
I tried to reproduce this plan on master with no luck, it looks like that
IMPALA-7957 resolved this as well. Could you help me clarify it?
Tested with the following table/query and result:
{code:java}
CREATE TABLE default.test (int_col INT) PARTITIONED BY (id INT);
INSERT INTO default.test PARTITION (id=1) VALUES (1);
INSERT INTO default.test PARTITION (id=2) VALUES (2);
INSERT INTO default.test PARTITION (id=3) VALUES (2);
INSERT INTO default.test PARTITION (id=4) VALUES (2);
INSERT INTO default.test PARTITION (id=4) VALUES (2);
COMPUTE STATS default.test;
SET NUM_NODES=1;
EXPLAIN SELECT t.id, t.int_col
FROM default.test t
LEFT JOIN
(SELECT id, int_col
FROM default.test) t2
ON (t.id = t2.id)
WHERE t.int_col = t.id
UNION ALL
VALUES (NULL, NULL);
+------------------------------------------------------------+
| Explain String |
+------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=2.95MB Threads=3 |
| Per-Host Resource Estimates: Memory=67MB |
| Codegen disabled by planner |
| |
| PLAN-ROOT SINK |
| | |
| 00:UNION |
| | constant-operands=1 |
| | row-size=8B cardinality=2 |
| | |
| 03:HASH JOIN [RIGHT OUTER JOIN] |
| | hash predicates: id = t.id |
| | runtime filters: RF000 <- t.id |
| | row-size=12B cardinality=1 |
| | |
| |--01:SCAN HDFS [default.test t] |
| | HDFS partitions=4/4 files=5 size=10B |
| | predicates: t.int_col = t.id |
| | row-size=8B cardinality=1 |
| | |
| 02:SCAN HDFS [default.test] |
| HDFS partitions=4/4 files=5 size=10B |
| runtime filters: RF000 -> id |
| row-size=4B cardinality=5 |
+------------------------------------------------------------+
{code}
> Add rule to remove useless SELECT node
> --------------------------------------
>
> Key: IMPALA-8423
> URL: https://issues.apache.org/jira/browse/IMPALA-8423
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Quanlong Huang
> Assignee: Tamas Mate
> Priority: Major
> Labels: performance
>
> We can add some rules to optimize the plan after we chose a cheapest plan
> based on cost. For example, one useful rule can be "removing useless SELECT
> nodes".
> Impala will generated a useless SELECT for the following query:
> {code:sql}
> SELECT t.id, t.int_col
> FROM functional.alltypestiny t
> LEFT JOIN
> (SELECT id, int_col
> FROM functional.alltypestiny) t2
> ON (t.id = t2.id)
> WHERE t.int_col = t.id
> UNION ALL
> VALUES (NULL, NULL){code}
> Its single node plan is
> {code:java}
> PLAN-ROOT SINK
> |
> 00:UNION
> | constant-operands=1
> | row-size=8B cardinality=1
> |
> 04:SELECT
> | predicates: t.id = t.int_col
> | row-size=12B cardinality=0
> |
> 03:HASH JOIN [RIGHT OUTER JOIN]
> | hash predicates: id = t.id
> | runtime filters: RF000 <- t.id
> | row-size=12B cardinality=1
> |
> |--01:SCAN HDFS [functional.alltypestiny t]
> | HDFS partitions=4/4 files=4 size=460B
> | predicates: t.int_col = t.id
> | row-size=8B cardinality=1
> |
> 02:SCAN HDFS [functional.alltypestiny]
> HDFS partitions=4/4 files=4 size=460B
> runtime filters: RF000 -> id
> row-size=4B cardinality=8{code}
> The SELECT node (id=04) is useless since its only predicate "t.id =
> t.int_col" has been enforced in the SCAN node (id=01) which is the right hand
> side of the RIGHT OUTER JOIN. The SELECT node won't filter out any more rows.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]