[GitHub] incubator-quickstep issue #96: QUICKSTEP-43 Introduced DestroyAggregationSta...

hbdeshmukh Tue, 23 Aug 2016 14:11:17 -0700

Github user hbdeshmukh commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/96
  
    Sure. Right now we are using a single threaded implementation for the 
FinalizeAggregation operator. It does two things - 
    1. Merging the multiple aggregation hash tables in one single global hash 
table. Note that same group by key can be present in different hash tables, 
hence the merging step is necessary.
    2. Finalizing the aggregate value. e.g. in case of AVG, we delay the 
computation of actual average until very last. Until that point we keep the 
running sum and count.
    
    There are multiple ways in which the FinalizeAggregation can be 
parallelized. One of them is divide the aggregation hash tables such that one 
hash table belongs to a given partition. We partition an input tuple using the 
group by key attributes and route that tuple to the appropriate hash table. 
Now, there are no overlapping keys across any two hash tables, hence the step 1 
mentioned above is not necessary. As step 2 naturally relies on the output of 
1, it can be done independently for each hash table. Therefore 1 and 2 can be 
parallelized.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #96: QUICKSTEP-43 Introduced DestroyAggregationSta...

Reply via email to