himanshug opened a new issue #8413: Improve GroupBy query execution to push 
limit to segment scan phase
URL: https://github.com/apache/incubator-druid/issues/8413
 
 
   ### Motivation
   
   GroupBy query execution has ways to push down limits to queryable nodes for 
merging which exploits the small limit to cut down on the data transfer and 
processing.
   however, currently the limit pushdown is only done till the merge phase at 
historical while segment scan phase does not exploit the limit pushdown which 
slows down a groupBy query if there were too many unique rows in the segments 
with many complex aggregators.
   
   so, this proposal is to optionally enable pushing down limit all the way 
down to segment scan during the GroupBy query processing.
   
   ### Proposed changes
   I have tried a prototype, on 0.15.0 build which is currently running in my 
Druid cluster, following changes (combined with #8412 ) show tremendous 
improvements for some of the groupBy queries on large dataset with complex 
aggregators and aggressive limit.
   
   - `GroupByQueryEngineV2.HashAggregateIterator.newGrouper()` is updated to 
return `LimitedBufferHashGrouper` when `query.isApplyLimitPushDown() == true`
   - `GroupByQueryEngineV2.GroupByEngineKeySerde` is updated to correctly 
implement `Grouper.BufferComparator bufferComparator()` and 
`Grouper.BufferComparator bufferComparatorWithAggregators(..)` methods.
   - A new method `Grouper.BufferComparator bufferComparator(int 
keyBufferPosition, @Nullable StringComparator stringComparator)` is added to 
`GroupByColumnSelectorStrategy`
   
   Also, to be safe, I am planning to add a query context flag 
`enableLimitPushdownToSegment` to enable this optimization .
   
   ### Test plan (optional)
   There are existing tests with queries the push down limits, will update them 
to also run with `enableLimitPushdownToSegment=true` 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to