[
https://issues.apache.org/jira/browse/IMPALA-12204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731918#comment-17731918
]
ASF subversion and git services commented on IMPALA-12204:
----------------------------------------------------------
Commit 47309d14ca6bd274dd72674e12092f6dd3e034f3 in impala's branch
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=47309d14c ]
IMPALA-12204: Fix redundant codegen info added in subplan profiles
The SUBPLAN node will open its right child node many times in its
GetNext(), depending on how many rows generated from its left child. The
right child of a SUBPLAN node is a subtree of operators. They should not
add codegen info into profile in their Open() method since it will be
invoked repeatedly.
Currently, DataSink and UnionNode have such an issue. This patch fixes
them by adding the codegen info to profile in Close() instead of Open(),
just like what we did in IMPALA-11200.
Tests:
- Add e2e tests
Change-Id: I99a0a842df63a03c61024e2b77d5118ca63a2b2d
Reviewed-on: http://gerrit.cloudera.org:8080/20037
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Csaba Ringhofer <[email protected]>
> Redundant codegen info of HashJoinBuilder inside a subplan
> ----------------------------------------------------------
>
> Key: IMPALA-12204
> URL: https://issues.apache.org/jira/browse/IMPALA-12204
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> In query profile, the info strings of a hash join builder contains an
> ExecOption that has content like "Build Side Codegen Enabled, Hash Table
> Construction Codegen Enabled". When there is a HashJoin node inside a SUBPLAN
> node, this string could be repeated many times since the SUBPLAN node will
> open the right child many times. This could blow up the profile size.
> I can reproduce this by the following query:
> {code:sql}
> select count(*) from
> tpch_nested_parquet.customer c1,
> tpch_nested_parquet.customer c2,
> (select x.* from c1.c_orders x, c2.c_orders y
> where x.o_orderkey = y.o_orderkey) v
> where c1.c_custkey = c2.c_custkey;{code}
> In the query plan, there is a HASH JOIN node inside a SUBPLAN node:
> {noformat}
> 08:SUBPLAN
> | row-size=56B cardinality=1.50M
> |
> |--06:NESTED LOOP JOIN [CROSS JOIN]
> | | row-size=56B cardinality=10
> | |
> | |--02:SINGULAR ROW SRC
> | | row-size=40B cardinality=1
> | |
> | 05:HASH JOIN [INNER JOIN]
> | | hash predicates: x.o_orderkey = y.o_orderkey
> | | row-size=16B cardinality=10
> | |
> | |--04:UNNEST [c2.c_orders y]
> | | row-size=0B cardinality=10
> | |
> | 03:UNNEST [c1.c_orders x]
> | row-size=0B cardinality=10
> {noformat}
> The query porfile has super long strings:
> {noformat}
> Hash Join Builder (join_node_id=5):
> ExecOption: Build Side Codegen Enabled, Hash Table Construction Codegen
> Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen
> Enabled,...
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]