[ 
https://issues.apache.org/jira/browse/IMPALA-8423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029030#comment-17029030
 ] 

Tamas Mate commented on IMPALA-8423:
------------------------------------

Hi [qualong],

I tried to reproduce this plan on master with no luck, it looks like that 
IMPALA-7957 resolved this as well. Could you help me clarify it?

Tested with the following table/query and result:
{code:java}
CREATE TABLE default.test (int_col INT) PARTITIONED BY (id INT);
INSERT INTO default.test PARTITION (id=1) VALUES (1);
INSERT INTO default.test PARTITION (id=2) VALUES (2);
INSERT INTO default.test PARTITION (id=3) VALUES (2);
INSERT INTO default.test PARTITION (id=4) VALUES (2);
INSERT INTO default.test PARTITION (id=4) VALUES (2);

COMPUTE STATS default.test;

SET NUM_NODES=1;

EXPLAIN SELECT t.id, t.int_col
FROM default.test t
LEFT JOIN
  (SELECT id, int_col
  FROM default.test) t2
ON (t.id = t2.id)
WHERE t.int_col = t.id
UNION ALL
VALUES (NULL, NULL);

+------------------------------------------------------------+
| Explain String                                             |
+------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=2.95MB Threads=3 |
| Per-Host Resource Estimates: Memory=67MB                   |
| Codegen disabled by planner                                |
|                                                            |
| PLAN-ROOT SINK                                             |
| |                                                          |
| 00:UNION                                                   |
| |  constant-operands=1                                     |
| |  row-size=8B cardinality=2                               |
| |                                                          |
| 03:HASH JOIN [RIGHT OUTER JOIN]                            |
| |  hash predicates: id = t.id                              |
| |  runtime filters: RF000 <- t.id                          |
| |  row-size=12B cardinality=1                              |
| |                                                          |
| |--01:SCAN HDFS [default.test t]                           |
| |     HDFS partitions=4/4 files=5 size=10B                 |
| |     predicates: t.int_col = t.id                         |
| |     row-size=8B cardinality=1                            |
| |                                                          |
| 02:SCAN HDFS [default.test]                                |
|    HDFS partitions=4/4 files=5 size=10B                    |
|    runtime filters: RF000 -> id                            |
|    row-size=4B cardinality=5                               |
+------------------------------------------------------------+
{code}

> Add rule to remove useless SELECT node
> --------------------------------------
>
>                 Key: IMPALA-8423
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8423
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Quanlong Huang
>            Assignee: Tamas Mate
>            Priority: Major
>              Labels: performance
>
> We can add some rules to optimize the plan after we chose a cheapest plan 
> based on cost. For example, one useful rule can be "removing useless SELECT 
> nodes".
> Impala will generated a useless SELECT for the following query:
> {code:sql}
> SELECT t.id, t.int_col
> FROM functional.alltypestiny t
> LEFT JOIN
>   (SELECT id, int_col
>   FROM functional.alltypestiny) t2
> ON (t.id = t2.id)
> WHERE t.int_col = t.id
> UNION ALL
> VALUES (NULL, NULL){code}
> Its single node plan is
> {code:java}
> PLAN-ROOT SINK
> |
> 00:UNION
> |  constant-operands=1
> |  row-size=8B cardinality=1
> |
> 04:SELECT
> |  predicates: t.id = t.int_col
> |  row-size=12B cardinality=0
> |
> 03:HASH JOIN [RIGHT OUTER JOIN]
> |  hash predicates: id = t.id
> |  runtime filters: RF000 <- t.id
> |  row-size=12B cardinality=1
> |
> |--01:SCAN HDFS [functional.alltypestiny t]
> |     HDFS partitions=4/4 files=4 size=460B
> |     predicates: t.int_col = t.id
> |     row-size=8B cardinality=1
> |
> 02:SCAN HDFS [functional.alltypestiny]
>    HDFS partitions=4/4 files=4 size=460B
>    runtime filters: RF000 -> id
>    row-size=4B cardinality=8{code}
> The SELECT node (id=04) is useless since its only predicate "t.id = 
> t.int_col" has been enforced in the SCAN node (id=01) which is the right hand 
> side of the RIGHT OUTER JOIN. The SELECT node won't filter out any more rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to