[ 
https://issues.apache.org/jira/browse/IMPALA-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200164#comment-17200164
 ] 

ASF subversion and git services commented on IMPALA-10179:
----------------------------------------------------------

Commit 6f9d9514c3c765c6e593ce0bf146041f3e561c29 in impala's branch 
refs/heads/master from Aman Sinha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6f9d951 ]

IMPALA-10179: After join inversion inherit the left child's parallelism

The planner makes a cost based decision to invert a join - i.e swap
left and right child - based on cardinality, row size and parallelism
of both sides. The join node's numNodes and numInstances attributes were
originally set based on the left child's values. These were not getting
reset after the inversion and the plan shows the original value,
although the backend actually uses the correct value from the left
input..so this mainly affects planner estimates.

This patch fixes this behavior by inheriting the new left child's
parallelism (numNodes and numInstances) after inversion. A common case
where the join inputs get flipped is when the original query has a
left semi join (IN, EXISTS etc.) and the subquery has a large
cardinality compared to the outer query. This gets converted to a
right semi join.  This pattern occurs for example in TPC-DS q10, q35,
q69.

Testing:
 - Updated the plans for several TPC-DS queries to reflect the modified
   parallelism.  11 queries had the join fragment 'instances' increase
   from either 1 or 2 to 6 (for planner tests hosts = 3 and mt_dop = 2).

   There were also 4 queries where the parallelism reduced. Based on an
   initial check, these changes seem expected.

   TODO: Check if this reduction is expected based on the costing. One
   option may be to use a max(left parallelism, right parallelism) value
   to avoid unintended side effects.

 - Couple of TPC-H queries also had increase in parallelism.
 - Ran e2e TPC-DS queries.
 - TODO: test performance on larger scale factor

Change-Id: I01ba559034ad76f7ccee41f237c81b29d8402950
Reviewed-on: http://gerrit.cloudera.org:8080/16480
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> After inverting a join's inputs the join's parallelism does not get reset
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-10179
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10179
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>
> In the following query, the left semi join gets flipped to a right semi join 
> due to the cardinality of the tables but the parallelism of the HashJoin 
> fragment (see Fragment F01) remains as hosts=1, instances=1.  The right 
> behavior should be to inherit the parallelism of the new probe input table 
> store_sales, so it should be hosts=3, instances=3 to avoid 
> under-parallelizing the HashJoin.
> {noformat}
> [localhost:21000] default> set explain_level=2;
> EXPLAIN_LEVEL set to 2
> [localhost:21000] default> use tpcds_parquet;
> Query: use tpcds_parquet
> [localhost:21000] tpcds_parquet> explain select count(*) from store_returns 
> where sr_customer_sk in (select ss_customer_sk from store_sales);
> Query: explain select count(*) from store_returns where sr_customer_sk in 
> (select ss_customer_sk from store_sales)
> Max Per-Host Resource Reservation: Memory=10.31MB Threads=6
> Per-Host Resource Estimates: Memory=85MB
> Analyzed query: SELECT count(*) FROM tpcds_parquet.store_returns LEFT SEMI 
> JOIN
> (SELECT ss_customer_sk FROM tpcds_parquet.store_sales) `$a$1` (`$c$1`) ON
> sr_customer_sk = `$a$1`.`$c$1`
> ""
> F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |  Per-Host Resources: mem-estimate=10.02MB mem-reservation=0B 
> thread-reservation=1
> PLAN-ROOT SINK
> |  output exprs: count(*)
> |  mem-estimate=0B mem-reservation=0B thread-reservation=0
> |
> 09:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
> thread-reservation=0
> |  tuple-ids=3 row-size=8B cardinality=1
> |  in pipelines: 09(GETNEXT), 04(OPEN)
> |
> 08:EXCHANGE [UNPARTITIONED]
> |  mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
> |  tuple-ids=3 row-size=8B cardinality=1
> |  in pipelines: 04(GETNEXT)
> |
> F01:PLAN FRAGMENT [HASH(tpcds_parquet.store_sales.ss_customer_sk)] hosts=1 
> instances=1
> Per-Host Resources: mem-estimate=23.88MB mem-reservation=5.81MB 
> thread-reservation=1 runtime-filters-memory=1.00MB
> 04:AGGREGATE
> |  output: count(*)
> |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
> thread-reservation=0
> |  tuple-ids=3 row-size=8B cardinality=1
> |  in pipelines: 04(GETNEXT), 06(OPEN)
> |
> 03:HASH JOIN [RIGHT SEMI JOIN, PARTITIONED]
> |  hash predicates: tpcds_parquet.store_sales.ss_customer_sk = sr_customer_sk
> |  runtime filters: RF000[bloom] <- sr_customer_sk
> |  mem-estimate=2.88MB mem-reservation=2.88MB spill-buffer=128.00KB 
> thread-reservation=0
> |  tuple-ids=0 row-size=4B cardinality=287.51K
> |  in pipelines: 06(GETNEXT), 00(OPEN)
> |
> |--07:EXCHANGE [HASH(sr_customer_sk)]
> |  |  mem-estimate=1.10MB mem-reservation=0B thread-reservation=0
> |  |  tuple-ids=0 row-size=4B cardinality=287.51K
> |  |  in pipelines: 00(GETNEXT)
> |  |
> |  F02:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
> |  Per-Host Resources: mem-estimate=24.00MB mem-reservation=1.00MB 
> thread-reservation=2
> |  00:SCAN HDFS [tpcds_parquet.store_returns, RANDOM]
> |     HDFS partitions=1/1 files=1 size=15.43MB
> |     stored statistics:
> |       table: rows=287.51K size=15.43MB
> |       columns: all
> |     extrapolated-rows=disabled max-scan-range-rows=287.51K
> |     mem-estimate=24.00MB mem-reservation=1.00MB thread-reservation=1
> |     tuple-ids=0 row-size=4B cardinality=287.51K
> |     in pipelines: 00(GETNEXT)
> |
> 06:AGGREGATE [FINALIZE]
> |  group by: tpcds_parquet.store_sales.ss_customer_sk
> |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0
> |  tuple-ids=6 row-size=4B cardinality=90.63K
> |  in pipelines: 06(GETNEXT), 01(OPEN)
> |
> 05:EXCHANGE [HASH(tpcds_parquet.store_sales.ss_customer_sk)]
> |  mem-estimate=142.01KB mem-reservation=0B thread-reservation=0
> |  tuple-ids=6 row-size=4B cardinality=90.63K
> |  in pipelines: 01(GETNEXT)
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
> Per-Host Resources: mem-estimate=27.00MB mem-reservation=3.50MB 
> thread-reservation=2 runtime-filters-memory=1.00MB
> 02:AGGREGATE [STREAMING]
> |  group by: tpcds_parquet.store_sales.ss_customer_sk
> |  mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB 
> thread-reservation=0
> |  tuple-ids=6 row-size=4B cardinality=90.63K
> |  in pipelines: 01(GETNEXT)
> |
> 01:SCAN HDFS [tpcds_parquet.store_sales, RANDOM]
>    HDFS partitions=1824/1824 files=1824 size=200.95MB
>    runtime filters: RF000[bloom] -> tpcds_parquet.store_sales.ss_customer_sk
>    stored statistics:
>      table: rows=2.88M size=200.95MB
>      partitions: 1824/1824 rows=2.88M
>      columns: all
>    extrapolated-rows=disabled max-scan-range-rows=130.09K
>    mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=1
>    tuple-ids=1 row-size=4B cardinality=2.88M
>    in pipelines: 01(GETNEXT)
> {noformat}
> The same behavior is seen for inner joins as well but to reproduce that I 
> have to comment out the 'ordering' of tables for the joins to force creating 
> an initial join order that is sub-optimal. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to