Re: [PR] Fix deadlock that can occur while merging group by results (druid)

via GitHub Sun, 07 Jan 2024 21:54:09 -0800


LakshSingla commented on PR #15420:
URL: https://github.com/apache/druid/pull/15420#issuecomment-1880421795


   Adding a comment here as well to aid in the code walkthrough, and for anyone 
revisiting the PR:
   
   The number of merge buffers required to execute a group by query is 
calculated based on the structure of the query. There are many levels of 
merging that can happen. Per my understanding, a raw segment gets aggregated 
after passing through the following structure:
   
   1. Each individual segment gets aggregated using the 
`GroupByQueryEngineV2#process`. This doesn't use the shared merge buffers. By 
definition, it is not nested, as it operates on the individual segments(In data 
server)
   2. The aggregated results from the multiple segments get partially merged, 
and combined into a single runner using the `GroupByMergingQueryRunnerV2`, 
which is sent to the broker. _This utilises the shared merge buffers, and can 
be one or two, depending on the value of the config 
`numParallelCombineThreads`._ Also, nested calls to this code doesn't use 
additional merge buffers, it goes through a much more expensive 
`ChainedExecutionQueryRunner`.
   3. The server calls additional QueryToolchest#mergeResults on the resulting 
runner to further aggregate the data. This doesn't use the merge buffers 
because the historical doesn't receive subqueries (see caveat below). 
   4. The broker fetches the results from the multiple data servers, and 
"merges" them using the CachingClusteredClient.SpecificQueryRunnable#run. This 
"merge" doesn't aggregate the result objects, it orders the different sequence 
objects sequentially. 
   5. The broker then calls the final QueryToolChest#mergeResults, which is 
then decorated upon.
   
   Steps 1-3 happen on the data servers, while 4-5 happen on the brokers. 
   It is worth noting that GroupByQueryRunnerFactory#mergeRunners can take up 
1-2 merging buffers depending on the value of the config 
`numParallelCombineThreads` and the GroupByQueryQueryRunnerToolchest can take 
up 0-3 merge buffers depending on the query structure (subqueries and the 
subtotals).
   
   The above was an idealistic world, where there was no nested call between 
the mergeResults and the mergeRunners, therefore there was a single place where 
the merge buffer can be acquired. However, there are two esoteric cases when 
this would not be true:
   
   1. The historicals get subqueries - Historicals only get the innermost 
query, and the broker processes further results on the returned results of the 
innermost query. However, if the flag `forcePushDownNestedQuery` is set to 
true, then the historical can have nested query. The steps 2 & 3 in the flow 
chart above would both acquire merge buffers. 
   2. The broker operates on inlined data source - The broker would then 
emulate a part of the historical's stack, and the broker would have a callstack 
like mergeResults(mergeRunners..) (See `LocalQuerySegmentWalker). 
   
   Therefore, in places where there's a nested call stack like 
mergeResults(.....mergeRunners(....)), the code acquires merge buffers in two 
places. This is true in:
   1. Data servers in all the cases (mergeResults can acquire 0 buffers, if the 
subquery & subtotals is null as is in most cases, however it can be non-null if 
the subquery was pushed down with the other query. Subtotals is always null)
   2. Broker, when the query is to be run on the historicals (again 
mergeResults can acquire exactly 0 buffers, depending on the structure, but the 
nested callstack is still there)
   
   The only place where we don't have a nested call stack is when the Broker 
merges the results from the historicals, wherein the mergeRunners i called by 
the historicals, and the mergeResults is called on the "combined" version of 
the runner returned by the historicals.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Re: [PR] Fix deadlock that can occur while merging group by results (druid)

Reply via email to