LakshSingla commented on PR #15420: URL: https://github.com/apache/druid/pull/15420#issuecomment-1880421795
Adding a comment here as well to aid in the code walkthrough, and for anyone revisiting the PR: The number of merge buffers required to execute a group by query is calculated based on the structure of the query. There are many levels of merging that can happen. Per my understanding, a raw segment gets aggregated after passing through the following structure: 1. Each individual segment gets aggregated using the `GroupByQueryEngineV2#process`. This doesn't use the shared merge buffers. By definition, it is not nested, as it operates on the individual segments(In data server) 2. The aggregated results from the multiple segments get partially merged, and combined into a single runner using the `GroupByMergingQueryRunnerV2`, which is sent to the broker. _This utilises the shared merge buffers, and can be one or two, depending on the value of the config `numParallelCombineThreads`._ Also, nested calls to this code doesn't use additional merge buffers, it goes through a much more expensive `ChainedExecutionQueryRunner`. 3. The server calls additional QueryToolchest#mergeResults on the resulting runner to further aggregate the data. This doesn't use the merge buffers because the historical doesn't receive subqueries (see caveat below). 4. The broker fetches the results from the multiple data servers, and "merges" them using the CachingClusteredClient.SpecificQueryRunnable#run. This "merge" doesn't aggregate the result objects, it orders the different sequence objects sequentially. 5. The broker then calls the final QueryToolChest#mergeResults, which is then decorated upon. Steps 1-3 happen on the data servers, while 4-5 happen on the brokers. It is worth noting that GroupByQueryRunnerFactory#mergeRunners can take up 1-2 merging buffers depending on the value of the config `numParallelCombineThreads` and the GroupByQueryQueryRunnerToolchest can take up 0-3 merge buffers depending on the query structure (subqueries and the subtotals). The above was an idealistic world, where there was no nested call between the mergeResults and the mergeRunners, therefore there was a single place where the merge buffer can be acquired. However, there are two esoteric cases when this would not be true: 1. The historicals get subqueries - Historicals only get the innermost query, and the broker processes further results on the returned results of the innermost query. However, if the flag `forcePushDownNestedQuery` is set to true, then the historical can have nested query. The steps 2 & 3 in the flow chart above would both acquire merge buffers. 2. The broker operates on inlined data source - The broker would then emulate a part of the historical's stack, and the broker would have a callstack like mergeResults(mergeRunners..) (See `LocalQuerySegmentWalker). Therefore, in places where there's a nested call stack like mergeResults(.....mergeRunners(....)), the code acquires merge buffers in two places. This is true in: 1. Data servers in all the cases (mergeResults can acquire 0 buffers, if the subquery & subtotals is null as is in most cases, however it can be non-null if the subquery was pushed down with the other query. Subtotals is always null) 2. Broker, when the query is to be run on the historicals (again mergeResults can acquire exactly 0 buffers, depending on the structure, but the nested callstack is still there) The only place where we don't have a nested call stack is when the Broker merges the results from the historicals, wherein the mergeRunners i called by the historicals, and the mergeResults is called on the "combined" version of the runner returned by the historicals. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org