[
https://issues.apache.org/jira/browse/IMPALA-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537472#comment-16537472
]
ASF subversion and git services commented on IMPALA-6031:
---------------------------------------------------------
Commit 880011fa1f2fc48f9972ad3c673a4bff838cc5ce in impala's branch
refs/heads/master from poojanilangekar
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=880011f ]
IMPALA-6031: Fix executor node count in distributed plans
Prior to this change, the planner also considered coordinator-only
nodes as executors while estimating the number of scan nodes to be
used in the distributed plan. This change ensures that only
executor nodes are considered for that estimation.
Testing:
Added a new custom cluster test to verify the same.
Change-Id: I44af6b40099a495e13a0a5dc72c491d486d23aa8
Reviewed-on: http://gerrit.cloudera.org:8080/10873
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Distributed plan describes coordinator-only nodes as scanning
> -------------------------------------------------------------
>
> Key: IMPALA-6031
> URL: https://issues.apache.org/jira/browse/IMPALA-6031
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 2.11.0
> Reporter: Jim Apple
> Assignee: Pooja Nilangekar
> Priority: Major
>
> In a cluster with one coordinator-only node and three executor-only nodes:
> {noformat}
> Query: explain select count(*) from web_sales a, web_sales b where
> a.ws_order_number = b.ws_order_number and a.ws_item_sk = b.ws_item_sk
> +------------------------------------------------------------------------------------------+
> | Explain String
> |
> +------------------------------------------------------------------------------------------+
> | Per-Host Resource Reservation: Memory=136.00MB
> |
> | Per-Host Resource Estimates: Memory=3.04GB
> |
> |
> |
> | F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |
> | PLAN-ROOT SINK
> |
> | | mem-estimate=0B mem-reservation=0B
> |
> | |
> |
> | 07:AGGREGATE [FINALIZE]
> |
> | | output: count:merge(*)
> |
> | | mem-estimate=10.00MB mem-reservation=0B
> |
> | | tuple-ids=2 row-size=8B cardinality=1
> |
> | |
> |
> | 06:EXCHANGE [UNPARTITIONED]
> |
> | mem-estimate=0B mem-reservation=0B
> |
> | tuple-ids=2 row-size=8B cardinality=1
> |
> |
> |
> | F02:PLAN FRAGMENT [HASH(a.ws_item_sk,a.ws_order_number)] hosts=4
> instances=4 |
> | DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=06, UNPARTITIONED]
> |
> | | mem-estimate=0B mem-reservation=0B
> |
> | 03:AGGREGATE
> |
> | | output: count(*)
> |
> | | mem-estimate=10.00MB mem-reservation=0B
> |
> | | tuple-ids=2 row-size=8B cardinality=1
> |
> | |
> |
> | 02:HASH JOIN [INNER JOIN, PARTITIONED]
> |
> | | hash predicates: a.ws_item_sk = b.ws_item_sk, a.ws_order_number =
> b.ws_order_number |
> | | runtime filters: RF000 <- b.ws_item_sk, RF001 <- b.ws_order_number
> |
> | | mem-estimate=2.95GB mem-reservation=136.00MB
> |
> | | tuple-ids=0,1 row-size=32B cardinality=720000376
> |
> | |
> |
> | |--05:EXCHANGE [HASH(b.ws_item_sk,b.ws_order_number)]
> |
> | | mem-estimate=0B mem-reservation=0B
> |
> | | tuple-ids=1 row-size=16B cardinality=720000376
> |
> | |
> |
> | 04:EXCHANGE [HASH(a.ws_item_sk,a.ws_order_number)]
> |
> | mem-estimate=0B mem-reservation=0B
> |
> | tuple-ids=0 row-size=16B cardinality=720000376
> |
> |
> |
> | F00:PLAN FRAGMENT [RANDOM] hosts=4 instances=4
> |
> | DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04,
> HASH(a.ws_item_sk,a.ws_order_number)] |
> | | mem-estimate=0B mem-reservation=0B
> |
> | 00:SCAN HDFS [tpcds_1000_parquet.web_sales a, RANDOM]
> |
> | partitions=1824/1824 files=1824 size=47.08GB
> |
> | runtime filters: RF000 -> a.ws_item_sk, RF001 -> a.ws_order_number
> |
> | table stats: 720000376 rows total
> |
> | column stats: all
> |
> | mem-estimate=80.00MB mem-reservation=0B
> |
> | tuple-ids=0 row-size=16B cardinality=720000376
> |
> |
> |
> | F01:PLAN FRAGMENT [RANDOM] hosts=4 instances=4
> |
> | DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05,
> HASH(b.ws_item_sk,b.ws_order_number)] |
> | | mem-estimate=0B mem-reservation=0B
> |
> | 01:SCAN HDFS [tpcds_1000_parquet.web_sales b, RANDOM]
> |
> | partitions=1824/1824 files=1824 size=47.08GB
> |
> | table stats: 720000376 rows total
> |
> | column stats: all
> |
> | mem-estimate=80.00MB mem-reservation=0B
> |
> | tuple-ids=1 row-size=16B cardinality=720000376
> |
> +------------------------------------------------------------------------------------------+
> {noformat}
> It looks like the scans are going to be on 4 hosts, but actually, after
> running the query:
> {noformat}
> summary;
> +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
> | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows |
> Peak Mem | Est. Peak Mem | Detail |
> +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
> | 07:AGGREGATE | 1 | 0ns | 0ns | 1 | 1 |
> 28.00 KB | 10.00 MB | FINALIZE |
> | 06:EXCHANGE | 1 | 0ns | 0ns | 3 | 1 | 0 B
> | 0 B | UNPARTITIONED |
> | 03:AGGREGATE | 3 | 345.33ms | 378.00ms | 3 | 1 |
> 139.91 KB | 10.00 MB | |
> | 02:HASH JOIN | 3 | 90.39s | 97.03s | 720.00M | 720.00M |
> 2.57 GB | 2.95 GB | INNER JOIN, PARTITIONED |
> | |--05:EXCHANGE | 3 | 4.48s | 4.65s | 720.00M | 720.00M | 0 B
> | 0 B | HASH(b.ws_item_sk,b.ws_order_number) |
> | | 01:SCAN HDFS | 3 | 59.31s | 67.16s | 720.00M | 720.00M |
> 22.88 MB | 80.00 MB | tpcds_1000_parquet.web_sales b |
> | 04:EXCHANGE | 3 | 4.57s | 4.87s | 720.00M | 720.00M | 0 B
> | 0 B | HASH(a.ws_item_sk,a.ws_order_number) |
> | 00:SCAN HDFS | 3 | 21.09s | 22.68s | 720.00M | 720.00M |
> 23.45 MB | 80.00 MB | tpcds_1000_parquet.web_sales a |
> +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
> {noformat}
> It looks to me like the distributed plan thinks the coordinator will scan,
> but the coordinator does not scan.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]