siddharthteotia commented on issue #4372: Support for single phase and 
two-phase distributed hash aggregation
URL: 
https://github.com/apache/incubator-pinot/issues/4372#issuecomment-506191736
 
 
   @kishoreg , thanks for your response. My understanding was that there was no 
reduction happening until data from each processing node was sent to the 
broker. I mistakenly thought that Combine operator doesn't come into picture up 
until broker. Thanks for clarifying that. 
   
   So the broker (1 reducer) will do the final aggregation processing by 
combining the data across nodes. Right?
   
   Another reason behind thinking of two-phase was that for low cardinality but 
high volume data set, if we can reduce everything at the segment server level 
(with a shuffle), then less data will be sent to broker. All of the execution 
specific operator logic can be kept in the segment server and broker need not 
worry about running out of memory and crashing while doing the final processing 
-- it can still while doing the union but there we can just use limit. 
Similarly, the segment servers can support spilling etc.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to