Quanlong Huang created IMPALA-15038:
---------------------------------------
Summary: Track cardinality of GroupingAggregators
Key: IMPALA-15038
URL: https://issues.apache.org/jira/browse/IMPALA-15038
Project: IMPALA
Issue Type: New Feature
Components: Backend, Frontend
Reporter: Quanlong Huang
Assignee: Quanlong Huang
IMPALA-14600 tracks cardinality of AggregationNodes which are shown in the
ExecSummary. Each AggregationNode could have multiple GroupingAggregators. Its
output cardinality is the sum of output cardinalities of these
GroupingAggregators. Tracking cardinality of individual GroupingAggregator
helps to reuse just this part of the stats.
For instance, the following query have 3 GroupingAggregators in 01:AGGREGATE
and 04:AGGREGATE:
{code:sql}
select year, month, count(id)
from functional.alltypes
group by rollup(year, month);{code}
The query plan:
{noformat}
02:AGGREGATE [FINALIZE]
| output: aggif(...
| row-size=20B cardinality=27
|
04:AGGREGATE [FINALIZE]
| Class 0
| output: count:merge(id)
| group by: `year`, `month`
| Class 1
| output: count:merge(id)
| group by: `year`, NULL
| Class 2
| output: count:merge(id)
| group by: NULL, NULL
| row-size=48B cardinality=27
|
03:EXCHANGE [...
|
01:AGGREGATE [STREAMING]
| Class 0
| output: count(id)
| group by: `year`, `month`
| Class 1
| output: count(id)
| group by: `year`, NULL
| Class 2
| output: count(id)
| group by: NULL, NULL
| row-size=48B cardinality=81
|
00:SCAN HDFS [functional.alltypes]{noformat}
In the ExecSummay, we can get the cardinality of 04:AGGREGATE is 27:
{noformat}
Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows
Peak Mem Est. Peak Mem Detail
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
F02:ROOT 1 1 94.843us 94.843us
4.01 MB 4.00 MB
05:EXCHANGE 1 1 74.074us 74.074us 27 27
112.00 KB 16.00 KB UNPARTITIONED
F01:EXCHANGE SENDER 3 3 65.638us 98.061us
255.00 B 96.00 KB
02:AGGREGATE 3 3 426.476us 576.215us 27 27
2.11 MB 10.00 MB FINALIZE
04:AGGREGATE 3 3 2.504ms 2.680ms 27 27
6.33 MB 30.00 MB FINALIZE
03:EXCHANGE 3 3 33.216us 35.615us 33 81
48.00 KB 16.00 KB HASH(CASE valid_tid(1,2,3) WHEN 1 THEN
murmur_hash(`year`) WHEN 2 THEN murmur_hash(`year`) WHEN 3 THEN
murmur_hash(NULL) END,CASE valid_tid(1,2,3) WHEN 1 THEN murmur_hash(`month`)
WHEN 2 THEN murmur_hash(NULL) WHEN 3 THEN murmur_hash(NULL) END)
F00:EXCHANGE SENDER 3 3 135.704us 196.780us
4.37 KB 720.00 KB
01:AGGREGATE 3 3 1.953ms 2.456ms 33 81
6.07 MB 30.00 MB STREAMING
00:SCAN HDFS 3 3 11.379ms 11.550ms 7.30K 7.30K
464.00 KB 160.00 MB functional.alltypes {noformat}
In the profile, we can break it down into GroupingAggregators. E.g. here is one
instance:
{noformat}
AGGREGATION_NODE (id=4):
- RowsReturned: 11 (11)
GroupingAggregator 0:
- RowsReturned: 9 (9)
GroupingAggregator 1:
- RowsReturned: 1 (1)
GroupingAggregator 2:
- RowsReturned: 1 (1){noformat}
Then reuse the cardinality if the GroupingAggregator matches other queries, e.g.
{code:sql}
select year, month, count(id)
from functional.alltypes
group by year, month;{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)