[
https://issues.apache.org/jira/browse/IMPALA-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong reassigned IMPALA-6031:
-------------------------------------
Assignee: Pooja Nilangekar
> Distributed plan describes coordinator-only nodes as scanning
> -------------------------------------------------------------
>
> Key: IMPALA-6031
> URL: https://issues.apache.org/jira/browse/IMPALA-6031
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 2.11.0
> Reporter: Jim Apple
> Assignee: Pooja Nilangekar
> Priority: Major
>
> In a cluster with one coordinator-only node and three executor-only nodes:
> {noformat}
> Query: explain select count(*) from web_sales a, web_sales b where
> a.ws_order_number = b.ws_order_number and a.ws_item_sk = b.ws_item_sk
> +------------------------------------------------------------------------------------------+
> | Explain String
> |
> +------------------------------------------------------------------------------------------+
> | Per-Host Resource Reservation: Memory=136.00MB
> |
> | Per-Host Resource Estimates: Memory=3.04GB
> |
> |
> |
> | F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |
> | PLAN-ROOT SINK
> |
> | | mem-estimate=0B mem-reservation=0B
> |
> | |
> |
> | 07:AGGREGATE [FINALIZE]
> |
> | | output: count:merge(*)
> |
> | | mem-estimate=10.00MB mem-reservation=0B
> |
> | | tuple-ids=2 row-size=8B cardinality=1
> |
> | |
> |
> | 06:EXCHANGE [UNPARTITIONED]
> |
> | mem-estimate=0B mem-reservation=0B
> |
> | tuple-ids=2 row-size=8B cardinality=1
> |
> |
> |
> | F02:PLAN FRAGMENT [HASH(a.ws_item_sk,a.ws_order_number)] hosts=4
> instances=4 |
> | DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=06, UNPARTITIONED]
> |
> | | mem-estimate=0B mem-reservation=0B
> |
> | 03:AGGREGATE
> |
> | | output: count(*)
> |
> | | mem-estimate=10.00MB mem-reservation=0B
> |
> | | tuple-ids=2 row-size=8B cardinality=1
> |
> | |
> |
> | 02:HASH JOIN [INNER JOIN, PARTITIONED]
> |
> | | hash predicates: a.ws_item_sk = b.ws_item_sk, a.ws_order_number =
> b.ws_order_number |
> | | runtime filters: RF000 <- b.ws_item_sk, RF001 <- b.ws_order_number
> |
> | | mem-estimate=2.95GB mem-reservation=136.00MB
> |
> | | tuple-ids=0,1 row-size=32B cardinality=720000376
> |
> | |
> |
> | |--05:EXCHANGE [HASH(b.ws_item_sk,b.ws_order_number)]
> |
> | | mem-estimate=0B mem-reservation=0B
> |
> | | tuple-ids=1 row-size=16B cardinality=720000376
> |
> | |
> |
> | 04:EXCHANGE [HASH(a.ws_item_sk,a.ws_order_number)]
> |
> | mem-estimate=0B mem-reservation=0B
> |
> | tuple-ids=0 row-size=16B cardinality=720000376
> |
> |
> |
> | F00:PLAN FRAGMENT [RANDOM] hosts=4 instances=4
> |
> | DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04,
> HASH(a.ws_item_sk,a.ws_order_number)] |
> | | mem-estimate=0B mem-reservation=0B
> |
> | 00:SCAN HDFS [tpcds_1000_parquet.web_sales a, RANDOM]
> |
> | partitions=1824/1824 files=1824 size=47.08GB
> |
> | runtime filters: RF000 -> a.ws_item_sk, RF001 -> a.ws_order_number
> |
> | table stats: 720000376 rows total
> |
> | column stats: all
> |
> | mem-estimate=80.00MB mem-reservation=0B
> |
> | tuple-ids=0 row-size=16B cardinality=720000376
> |
> |
> |
> | F01:PLAN FRAGMENT [RANDOM] hosts=4 instances=4
> |
> | DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05,
> HASH(b.ws_item_sk,b.ws_order_number)] |
> | | mem-estimate=0B mem-reservation=0B
> |
> | 01:SCAN HDFS [tpcds_1000_parquet.web_sales b, RANDOM]
> |
> | partitions=1824/1824 files=1824 size=47.08GB
> |
> | table stats: 720000376 rows total
> |
> | column stats: all
> |
> | mem-estimate=80.00MB mem-reservation=0B
> |
> | tuple-ids=1 row-size=16B cardinality=720000376
> |
> +------------------------------------------------------------------------------------------+
> {noformat}
> It looks like the scans are going to be on 4 hosts, but actually, after
> running the query:
> {noformat}
> summary;
> +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
> | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows |
> Peak Mem | Est. Peak Mem | Detail |
> +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
> | 07:AGGREGATE | 1 | 0ns | 0ns | 1 | 1 |
> 28.00 KB | 10.00 MB | FINALIZE |
> | 06:EXCHANGE | 1 | 0ns | 0ns | 3 | 1 | 0 B
> | 0 B | UNPARTITIONED |
> | 03:AGGREGATE | 3 | 345.33ms | 378.00ms | 3 | 1 |
> 139.91 KB | 10.00 MB | |
> | 02:HASH JOIN | 3 | 90.39s | 97.03s | 720.00M | 720.00M |
> 2.57 GB | 2.95 GB | INNER JOIN, PARTITIONED |
> | |--05:EXCHANGE | 3 | 4.48s | 4.65s | 720.00M | 720.00M | 0 B
> | 0 B | HASH(b.ws_item_sk,b.ws_order_number) |
> | | 01:SCAN HDFS | 3 | 59.31s | 67.16s | 720.00M | 720.00M |
> 22.88 MB | 80.00 MB | tpcds_1000_parquet.web_sales b |
> | 04:EXCHANGE | 3 | 4.57s | 4.87s | 720.00M | 720.00M | 0 B
> | 0 B | HASH(a.ws_item_sk,a.ws_order_number) |
> | 00:SCAN HDFS | 3 | 21.09s | 22.68s | 720.00M | 720.00M |
> 23.45 MB | 80.00 MB | tpcds_1000_parquet.web_sales a |
> +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
> {noformat}
> It looks to me like the distributed plan thinks the coordinator will scan,
> but the coordinator does not scan.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]