Re: [PR] Fix deadlock that can occur while merging group by results (druid)

via GitHub Wed, 10 Jan 2024 23:08:49 -0800


gianm commented on PR #15420:
URL: https://github.com/apache/druid/pull/15420#issuecomment-1886424908


   I'm hoping we can fix the bug without making the code too much more complex. 
There's some things that contribute to the complexity and I'm hoping we can do 
some simpler alternative:
   
   1. The new context keys add to complexity because there's no mechanism that 
ensures they are set at the appropriate places, so it's tough to understand how 
and when they're supposed to be set. They're kind of "magical".
   2. The fact that we are now cloning the response context, and are using 
these cloned response contexts to store lifecycled objects. It makes it harder 
to follow the lifecycle of the objects and ensure they are properly passed 
around and closed.
   
   As an alternative to the context keys, maybe we can either add a parameter 
to `mergeResults` like `willMergeRunners`, or maybe we can add a new method on 
the toolchest entirely like `acquireResources(boolean willMergeRunners)`. These 
are less "magical" than the context keys since Java itself will ensure that the 
parameters are passed in at the appropriate times. Makes it a lot easier to 
follow what's going on.
   
   As to the response context, I don't really understand why we're cloning it. 
What's the reason for that? In any case, a possible alternative could be to put 
a unique key in the query context (just a string) and then have the various 
parts of the query use that key to get their resources from something that's 
injected (like the merge buffer pool is currently injected). Or, an alternative 
could be to eliminate the cloning and have a requirement that only one 
`mergeRunners` stack be running at once. I think that's in fact true…
   
   Lastly, for testing, maybe I missed it but I didn't see a test case for the 
Broker-side scenario. The following query is the simplest one I could come up 
with that exhibits the problem. Perhaps `ClientQuerySegmentWalkerTest` would be 
a good place to put a test using a query like this.
   
   ```json
   {
     "queryType": "groupBy",
     "dataSource": {
       "type": "query",
       "query": {
         "queryType": "groupBy",
         "dataSource": {
           "type": "inline",
           "columnNames": []
           "rows": []
         },
         "intervals": 
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z",
         "granularity": "all"
       }
     },
     "intervals": 
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z",
     "granularity": "all"
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix deadlock that can occur while merging group by results (druid)

Reply via email to