Harshad Deshmukh created QUICKSTEP-57:
-----------------------------------------

             Summary: FinalizeAggregation Performance Improvement
                 Key: QUICKSTEP-57
                 URL: https://issues.apache.org/jira/browse/QUICKSTEP-57
             Project: Apache Quickstep
          Issue Type: Improvement
          Components: Relational Operators, Storage
            Reporter: Harshad Deshmukh
            Assignee: Harshad Deshmukh


The two step GROUP BY aggregation involves two steps:
1. Aggregation from StorageBlocks in different hash tables. (Performed through 
Aggregation operator). The number of hash tables are same as number of worker 
threads. Each thread uses only one hash table at a time. 
2. Merging the various aggregation hash tables in one (Performed through 
Finalize Aggregation operator)

The step 2 is needed because the same GROUP BY key could be present in multiple 
hash tables and we need to merge the payloads for the key. 

We can avoid the step 2 if the different hash tables mentioned in step 1 have 
no overlap in terms of their GROUP BY keys. One way to achieve this is by 
partitioning the aggregated tuples based on their GROUP BY keys. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to