Quanlong Huang created IMPALA-15038:
---------------------------------------

             Summary: Track cardinality of GroupingAggregators
                 Key: IMPALA-15038
                 URL: https://issues.apache.org/jira/browse/IMPALA-15038
             Project: IMPALA
          Issue Type: New Feature
          Components: Backend, Frontend
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


IMPALA-14600 tracks cardinality of AggregationNodes which are shown in the 
ExecSummary. Each AggregationNode could have multiple GroupingAggregators. Its 
output cardinality is the sum of output cardinalities of these 
GroupingAggregators. Tracking cardinality of individual GroupingAggregator 
helps to reuse just this part of the stats.

For instance, the following query have 3 GroupingAggregators in 01:AGGREGATE 
and 04:AGGREGATE:
{code:sql}
select year, month, count(id)
  from functional.alltypes
  group by rollup(year, month);{code}
The query plan:
{noformat}
02:AGGREGATE [FINALIZE]
|  output: aggif(...
|  row-size=20B cardinality=27
|
04:AGGREGATE [FINALIZE]
|  Class 0
|    output: count:merge(id)
|    group by: `year`, `month`
|  Class 1
|    output: count:merge(id)
|    group by: `year`, NULL
|  Class 2
|    output: count:merge(id)
|    group by: NULL, NULL
|  row-size=48B cardinality=27
|
03:EXCHANGE [...
|
01:AGGREGATE [STREAMING]
|  Class 0
|    output: count(id)
|    group by: `year`, `month`
|  Class 1
|    output: count(id)
|    group by: `year`, NULL
|  Class 2
|    output: count(id)
|    group by: NULL, NULL
|  row-size=48B cardinality=81
|
00:SCAN HDFS [functional.alltypes]{noformat}
In the ExecSummay, we can get the cardinality of 04:AGGREGATE is 27:
{noformat}
Operator              #Hosts  #Inst   Avg Time   Max Time  #Rows  Est. #Rows   
Peak Mem  Est. Peak Mem  Detail                                                 
                                                                                
                                                                                
                              
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
F02:ROOT                   1      1   94.843us   94.843us                       
4.01 MB        4.00 MB                                                          
                                                                                
                                                                                
                             
05:EXCHANGE                1      1   74.074us   74.074us     27          27  
112.00 KB       16.00 KB  UNPARTITIONED                                         
                                                                                
                                                                                
                               
F01:EXCHANGE SENDER        3      3   65.638us   98.061us                      
255.00 B       96.00 KB                                                         
                                                                                
                                                                                
                              
02:AGGREGATE               3      3  426.476us  576.215us     27          27    
2.11 MB       10.00 MB  FINALIZE                                                
                                                                                
                                                                                
                             
04:AGGREGATE               3      3    2.504ms    2.680ms     27          27    
6.33 MB       30.00 MB  FINALIZE                                                
                                                                                
                                                                                
                             
03:EXCHANGE                3      3   33.216us   35.615us     33          81   
48.00 KB       16.00 KB  HASH(CASE valid_tid(1,2,3) WHEN 1 THEN 
murmur_hash(`year`) WHEN 2 THEN murmur_hash(`year`) WHEN 3 THEN 
murmur_hash(NULL) END,CASE valid_tid(1,2,3) WHEN 1 THEN murmur_hash(`month`) 
WHEN 2 THEN murmur_hash(NULL) WHEN 3 THEN murmur_hash(NULL) END) 
F00:EXCHANGE SENDER        3      3  135.704us  196.780us                       
4.37 KB      720.00 KB                                                          
                                                                                
                                                                                
                             
01:AGGREGATE               3      3    1.953ms    2.456ms     33          81    
6.07 MB       30.00 MB  STREAMING                                               
                                                                                
                                                                                
                             
00:SCAN HDFS               3      3   11.379ms   11.550ms  7.30K       7.30K  
464.00 KB      160.00 MB  functional.alltypes     {noformat}
In the profile, we can break it down into GroupingAggregators. E.g. here is one 
instance:
{noformat}
       AGGREGATION_NODE (id=4):
           - RowsReturned: 11 (11)
          GroupingAggregator 0:
             - RowsReturned: 9 (9)
          GroupingAggregator 1:
             - RowsReturned: 1 (1)
          GroupingAggregator 2:
             - RowsReturned: 1 (1){noformat}
Then reuse the cardinality if the GroupingAggregator matches other queries, e.g.
{code:sql}
select year, month, count(id)
  from functional.alltypes
  group by year, month;{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to