[
https://issues.apache.org/jira/browse/IMPALA-12204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731438#comment-17731438
]
Quanlong Huang commented on IMPALA-12204:
-----------------------------------------
Go through all other Open methods and find out that UnionNode::Open() could
also append strings repeatedly to the profile. Code snipper:
{code:cpp}
Status UnionNode::Open(RuntimeState* state) {
...
if (is_codegen_status_added_ && num_const_scalar_expr_to_be_codegened_ == 0
&& !const_exprs_lists_.empty()) {
runtime_profile_->AppendExecOption("Codegen Disabled for const scalar
expressions");
}
return Status::OK();
} {code}
The following query will hit the issue:
{code:sql}
select count(*) from
tpch_nested_parquet.customer c1,
tpch_nested_parquet.customer c2,
(
select x.o_orderkey from c1.c_orders x
union all
select y.o_orderkey from c2.c_orders y
union all
select 100
) v
where c1.c_custkey = c2.c_custkey;{code}
A UnionNode is inside the subplan:
{code:sql}
08:SUBPLAN
| row-size=40B cardinality=3.00M
|
|--06:NESTED LOOP JOIN [CROSS JOIN]
| | row-size=40B cardinality=20
| |
| |--02:SINGULAR ROW SRC
| | row-size=40B cardinality=1
| |
| 03:UNION
| | row-size=0B cardinality=20
| |
| |--05:UNNEST [c2.c_orders y]
| | row-size=0B cardinality=10
| |
| 04:UNNEST [c1.c_orders x]
| row-size=0B cardinality=10 {code}
Saw repeated strings of "Codegen Disabled for const scalar expressions" in
profile:
{noformat}
UNION_NODE (id=3):
ExecOption: Codegen Disabled for const scalar expressions, Codegen Disabled
for const scalar expressions, Codegen Disabled for const scalar expressions,
Codegen Disabled for const scalar expressions,...{noformat}
> Redundant codegen info of HashJoinBuilder inside a subplan
> ----------------------------------------------------------
>
> Key: IMPALA-12204
> URL: https://issues.apache.org/jira/browse/IMPALA-12204
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> In query profile, the info strings of a hash join builder contains an
> ExecOption that has content like "Build Side Codegen Enabled, Hash Table
> Construction Codegen Enabled". When there is a HashJoin node inside a SUBPLAN
> node, this string could be repeated many times since the SUBPLAN node will
> open the right child many times. This could blow up the profile size.
> I can reproduce this by the following query:
> {code:sql}
> select count(*) from
> tpch_nested_parquet.customer c1,
> tpch_nested_parquet.customer c2,
> (select x.* from c1.c_orders x, c2.c_orders y
> where x.o_orderkey = y.o_orderkey) v
> where c1.c_custkey = c2.c_custkey;{code}
> In the query plan, there is a HASH JOIN node inside a SUBPLAN node:
> {noformat}
> 08:SUBPLAN
> | row-size=56B cardinality=1.50M
> |
> |--06:NESTED LOOP JOIN [CROSS JOIN]
> | | row-size=56B cardinality=10
> | |
> | |--02:SINGULAR ROW SRC
> | | row-size=40B cardinality=1
> | |
> | 05:HASH JOIN [INNER JOIN]
> | | hash predicates: x.o_orderkey = y.o_orderkey
> | | row-size=16B cardinality=10
> | |
> | |--04:UNNEST [c2.c_orders y]
> | | row-size=0B cardinality=10
> | |
> | 03:UNNEST [c1.c_orders x]
> | row-size=0B cardinality=10
> {noformat}
> The query porfile has super long strings:
> {noformat}
> Hash Join Builder (join_node_id=5):
> ExecOption: Build Side Codegen Enabled, Hash Table Construction Codegen
> Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen
> Enabled,...
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]