cryptoe opened a new pull request, #15368:
URL: https://github.com/apache/druid/pull/15368

   Saw bug where MSQ controller task would continue to hold the task slot even 
after cancel was issued. 
   This was due to a deadlock created on work launch. The main thread was 
waiting for tasks to spawn and the cancel thread was waiting for tasks to 
finish. 
   The fix was to instruct the `MSQWorkerTaskLauncher` thread to stop creating 
new tasks which would enable the main thread to unblock and release the slot. 
   
   Also short circuited the taskRetriable condition. Now the check is run in 
the `MSQWorkerTaskLauncher` thread as opposed to the main event thread loop. 
This will result in faster task failure in case the task is deemed to be non 
retriable. 
   
   
   This PR has:
   
   - [x] been self-reviewed.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to