wirybeaver opened a new issue, #12080:
URL: https://github.com/apache/pinot/issues/12080

   When I read the source code Pinot's GroupByExecutor, I found out it lacks of 
the following features of Druid's GroupByV2Engine:
   1. Spill to disk for merging buffer. [Druid 
ParallelCombiner](https://github.com/apache/druid/blob/9f3b26676d30f90599a7d55e43549617e0cee082/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/ParallelCombiner.java#L64)
   2. Parallel combine when merging sorted aggregation result.  Druid will 
create a combining tree thread for local historical nodes.  [Druid 
SpillingGrouper](https://github.com/apache/druid/blob/9f3b26676d30f90599a7d55e43549617e0cee082/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/SpillingGrouper.java)
     //        o         <- non-leaf node
     //     / / \ \      <- ICD = 4
     //  o   o   o   o   <- non-leaf nodes
     // / \ / \ / \ / \  <- LCD = 2
     // o o o o o o o o  <- leaf nodes
   
   Reference: [Druid GroupBy Tuning 
Guide](https://druid.apache.org/docs/latest/querying/groupbyquery/)
   
   Druid seems to always sort the aggregate result by default when the Limit 
pushdown is not enabled as the tuning guide mentioned. I have a strong feeling 
that integrating DiskSpill feature allows Pinot to process large scale of data 
and resolve the issue of indeterministic result for groupBy without orderBy, 
i.e. https://github.com/apache/pinot/issues/11706. In addition, the NonLeaf 
stage in Multistage V2 can also adopts those two features for partitioned 
aggregation. 
   
   Raise this issue to solicit opinions from folks. If there's sufficient 
support, I will write a design doc for leaf stage group by execution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to