Lei Chang created HAWQ-1255:
-------------------------------
Summary: Looks "segment size with penalty" number in "explain
analyze" not correct
Key: HAWQ-1255
URL: https://issues.apache.org/jira/browse/HAWQ-1255
Project: Apache HAWQ
Issue Type: Bug
Components: Query Execution
Reporter: Lei Chang
Assignee: Lei Chang
"segment size" is about 500MB, while "segment size with penalty" is about
100MB. Looks not reasonable.
How to reproduce:
on laptop, 1G tpch data, lineitem table is created as hash distributed with 2
buckets, and orders table is randomly.
```
postgres=# explain analyze SELECT l_orderkey, count(l_quantity)
FROM
lineitem_b2, orders
WHERE
l_orderkey = o_orderkey
GROUP BY
l_orderkey;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 2:1 (slice2; segments: 2) (cost=291580.96..318527.67
rows=1230576 width=16)
Rows out: Avg 1500000.0 rows x 1 workers at destination.
Max/Last(seg-1:changlei.local/seg-1:changlei.local) 1500000/1500000 rows with
2209/2209 ms to first row, 2577/2577 ms to end, start offset by 1.429/1.429 ms.
-> HashAggregate (cost=291580.96..318527.67 rows=615288 width=16)
Group By: lineitem_b2.l_orderkey
Rows out: Avg 750000.0 rows x 2 workers.
Max/Last(seg1:changlei.local/seg1:changlei.local) 750000/750000 rows with
2243/2243 ms to first row, 2498/2498 ms to end, start offset by 2.615/2.615 ms.
Executor memory: 56282K bytes avg, 56282K bytes max
(seg1:changlei.local).
-> Hash Join (cost=70069.00..250010.38 rows=3000608 width=15)
Hash Cond: lineitem_b2.l_orderkey = orders.o_orderkey
Rows out: Avg 3000607.5 rows x 2 workers.
Max/Last(seg0:changlei.local/seg1:changlei.local) 3001300/2999915 rows with
350/350 ms to first row, 1611/1645 ms to end, start offset by 3.819/3.816 ms.
Executor memory: 49153K bytes avg, 49153K bytes max
(seg1:changlei.local).
Work_mem used: 23438K bytes avg, 23438K bytes max
(seg1:changlei.local). Workfile: (0 spilling, 0 reused)
(seg0) Hash chain length 1.7 avg, 3 max, using 434205 of
524341 buckets.
-> Append-only Scan on lineitem_b2 (cost=0.00..89923.15
rows=3000608 width=15)
Rows out: Avg 3000607.5 rows x 2 workers.
Max/Last(seg0:changlei.local/seg1:changlei.local) 3001300/2999915 rows with
4.460/4.757 ms to first row, 546/581 ms to end, start offset by 350/349 ms.
-> Hash (cost=51319.00..51319.00 rows=750000 width=8)
Rows in: Avg 750000.0 rows x 2 workers.
Max/Last(seg1:changlei.local/seg0:changlei.local) 750000/750000 rows with
341/344 ms to end, start offset by 8.114/5.610 ms.
-> Redistribute Motion 2:2 (slice1; segments: 2)
(cost=0.00..51319.00 rows=750000 width=8)
Hash Key: orders.o_orderkey
Rows out: Avg 750000.0 rows x 2 workers at
destination. Max/Last(seg1:changlei.local/seg0:changlei.local) 750000/750000
rows with 0.052/2.461 ms to first row, 207/207 ms to end, start offset by
8.114/5.611 ms.
-> Append-only Scan on orders (cost=0.00..21319.00
rows=750000 width=8)
Rows out: Avg 750000.0 rows x 2 workers.
Max/Last(seg1:changlei.local/seg0:changlei.local) 750000/750000 rows with
4.773/4.987 ms to first row, 166/171 ms to end, start offset by 2.911/2.697 ms.
Slice statistics:
(slice0) Executor memory: 281K bytes.
(slice1) Executor memory: 319K bytes avg x 2 workers, 319K bytes max
(seg1:changlei.local).
(slice2) Executor memory: 105773K bytes avg x 2 workers, 105773K bytes
max (seg1:changlei.local). Work_mem: 23438K bytes max.
Statement statistics:
Memory used: 262144K bytes
Settings: default_hash_table_bucket_number=2
Dispatcher statistics:
executors used(total/cached/new connection): (4/4/0); dispatcher
time(total/connection/dispatch data): (0.171 ms/0.000 ms/0.048 ms).
dispatch data time(max/min/avg): (0.030 ms/0.004 ms/0.011 ms); consume
executor data time(max/min/avg): (0.014 ms/0.009 ms/0.011 ms); free executor
time(max/min/avg): (0.000 ms/0.000 ms/0.000 ms).
Data locality statistics:
data locality ratio: 1.000; virtual segment number: 2; different host
number: 1; virtual segment number per host(avg/min/max): (2/2/2); segment
size(avg/min/max): (593580028.000 B/593495232 B/593664824 B); segment size with
penalty(avg/min/max): (103526336.000 B/103517072 B/103535600 B);
continuity(avg/min/max): (1.000/1.000/1.000); DFS metadatacache: 0.094 ms;
resource allocation: 4.503 ms; datalocality calculation: 1.493 ms.
Total runtime: 2660.146 ms
(34 rows)
```
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)