[
https://issues.apache.org/jira/browse/IMPALA-8423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029106#comment-17029106
]
Tim Armstrong commented on IMPALA-8423:
---------------------------------------
Mostly the planner only inserts the SELECT nodes when the predicates could not
be placed lower in the plan (it keeps track of which predicates have been
assigned to a node already). This might've just been a consequence of a
different bug. I wonder if there are even any (important) cases where the
planner inserts an unnecessary select node and it's not a bug in the predicate
placement algorithm.
> Add rule to remove useless SELECT node
> --------------------------------------
>
> Key: IMPALA-8423
> URL: https://issues.apache.org/jira/browse/IMPALA-8423
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Quanlong Huang
> Assignee: Tamas Mate
> Priority: Major
> Labels: performance
>
> We can add some rules to optimize the plan after we chose a cheapest plan
> based on cost. For example, one useful rule can be "removing useless SELECT
> nodes".
> Impala will generated a useless SELECT for the following query:
> {code:sql}
> SELECT t.id, t.int_col
> FROM functional.alltypestiny t
> LEFT JOIN
> (SELECT id, int_col
> FROM functional.alltypestiny) t2
> ON (t.id = t2.id)
> WHERE t.int_col = t.id
> UNION ALL
> VALUES (NULL, NULL){code}
> Its single node plan is
> {code:java}
> PLAN-ROOT SINK
> |
> 00:UNION
> | constant-operands=1
> | row-size=8B cardinality=1
> |
> 04:SELECT
> | predicates: t.id = t.int_col
> | row-size=12B cardinality=0
> |
> 03:HASH JOIN [RIGHT OUTER JOIN]
> | hash predicates: id = t.id
> | runtime filters: RF000 <- t.id
> | row-size=12B cardinality=1
> |
> |--01:SCAN HDFS [functional.alltypestiny t]
> | HDFS partitions=4/4 files=4 size=460B
> | predicates: t.int_col = t.id
> | row-size=8B cardinality=1
> |
> 02:SCAN HDFS [functional.alltypestiny]
> HDFS partitions=4/4 files=4 size=460B
> runtime filters: RF000 -> id
> row-size=4B cardinality=8{code}
> The SELECT node (id=04) is useless since its only predicate "t.id =
> t.int_col" has been enforced in the SCAN node (id=01) which is the right hand
> side of the RIGHT OUTER JOIN. The SELECT node won't filter out any more rows.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]