[ 
https://issues.apache.org/jira/browse/IMPALA-12204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12204:
------------------------------------
    Description: 
In query profile, the info strings of a hash join builder contains an 
ExecOption that has content like "Build Side Codegen Enabled, Hash Table 
Construction Codegen Enabled". When there is a HashJoin node inside a SUBPLAN 
node, this string could be repeated many times since the SUBPLAN node will open 
the right child many times. This could blow up the profile size.

I can reproduce this by the following query:
{code:sql}
select count(*) from
  tpch_nested_parquet.customer c1,
  tpch_nested_parquet.customer c2,
  (select x.* from c1.c_orders x, c2.c_orders y
  where x.o_orderkey = y.o_orderkey) v
where c1.c_custkey = c2.c_custkey;{code}
In the query plan, there is a HASH JOIN node inside a SUBPLAN node:
{noformat}
08:SUBPLAN
|  row-size=56B cardinality=1.50M
|
|--06:NESTED LOOP JOIN [CROSS JOIN]
|  |  row-size=56B cardinality=10
|  |
|  |--02:SINGULAR ROW SRC
|  |     row-size=40B cardinality=1
|  |
|  05:HASH JOIN [INNER JOIN]
|  |  hash predicates: x.o_orderkey = y.o_orderkey
|  |  row-size=16B cardinality=10
|  |
|  |--04:UNNEST [c2.c_orders y]
|  |     row-size=0B cardinality=10
|  |
|  03:UNNEST [c1.c_orders x]
|     row-size=0B cardinality=10
 {noformat}
The query porfile has super long strings:
{noformat}
Hash Join Builder (join_node_id=5):
  ExecOption: Build Side Codegen Enabled, Hash Table Construction Codegen 
Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen Enabled,...
{noformat}

  was:
In query profile, the info strings of a hash join builder contains an 
ExecOption that has content like "Build Side Codegen Enabled, Hash Table 
Construction Codegen Enabled". When there is a HashJoin node inside a SUBPLAN 
node, this string could be repeated many times since the SUBPLAN node open and 
close the right child many times. This could blow up the profile size.

I can reproduce this by the following query:
{code:sql}
select count(*) from
  tpch_nested_parquet.customer c1,
  tpch_nested_parquet.customer c2,
  (select x.* from c1.c_orders x, c2.c_orders y
  where x.o_orderkey = y.o_orderkey) v
where c1.c_custkey = c2.c_custkey;{code}
In the query plan, there is a HASH JOIN node inside a SUBPLAN node:
{noformat}
08:SUBPLAN
|  row-size=56B cardinality=1.50M
|
|--06:NESTED LOOP JOIN [CROSS JOIN]
|  |  row-size=56B cardinality=10
|  |
|  |--02:SINGULAR ROW SRC
|  |     row-size=40B cardinality=1
|  |
|  05:HASH JOIN [INNER JOIN]
|  |  hash predicates: x.o_orderkey = y.o_orderkey
|  |  row-size=16B cardinality=10
|  |
|  |--04:UNNEST [c2.c_orders y]
|  |     row-size=0B cardinality=10
|  |
|  03:UNNEST [c1.c_orders x]
|     row-size=0B cardinality=10
 {noformat}
The query porfile has super long strings:
{noformat}
Hash Join Builder (join_node_id=5):
  ExecOption: Build Side Codegen Enabled, Hash Table Construction Codegen 
Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen Enabled,...
{noformat}


> Redundant codegen info of HashJoinBuilder inside a subplan
> ----------------------------------------------------------
>
>                 Key: IMPALA-12204
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12204
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> In query profile, the info strings of a hash join builder contains an 
> ExecOption that has content like "Build Side Codegen Enabled, Hash Table 
> Construction Codegen Enabled". When there is a HashJoin node inside a SUBPLAN 
> node, this string could be repeated many times since the SUBPLAN node will 
> open the right child many times. This could blow up the profile size.
> I can reproduce this by the following query:
> {code:sql}
> select count(*) from
>   tpch_nested_parquet.customer c1,
>   tpch_nested_parquet.customer c2,
>   (select x.* from c1.c_orders x, c2.c_orders y
>   where x.o_orderkey = y.o_orderkey) v
> where c1.c_custkey = c2.c_custkey;{code}
> In the query plan, there is a HASH JOIN node inside a SUBPLAN node:
> {noformat}
> 08:SUBPLAN
> |  row-size=56B cardinality=1.50M
> |
> |--06:NESTED LOOP JOIN [CROSS JOIN]
> |  |  row-size=56B cardinality=10
> |  |
> |  |--02:SINGULAR ROW SRC
> |  |     row-size=40B cardinality=1
> |  |
> |  05:HASH JOIN [INNER JOIN]
> |  |  hash predicates: x.o_orderkey = y.o_orderkey
> |  |  row-size=16B cardinality=10
> |  |
> |  |--04:UNNEST [c2.c_orders y]
> |  |     row-size=0B cardinality=10
> |  |
> |  03:UNNEST [c1.c_orders x]
> |     row-size=0B cardinality=10
>  {noformat}
> The query porfile has super long strings:
> {noformat}
> Hash Join Builder (join_node_id=5):
>   ExecOption: Build Side Codegen Enabled, Hash Table Construction Codegen 
> Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen 
> Enabled,...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to