gianm opened a new pull request, #18931:
URL: https://github.com/apache/druid/pull/18931

   Patch #18095 replaced various previously-existing worker cancellation 
mechanisms with an interrupt of the worker thread. Unfortunately, in debugging 
some stuck tests, it has been revealed that there is at least one scenario 
where an InterruptedException could be swallowed when the main worker thread 
handles a new work order. This defeated the cancellation mechanism, leading the 
worker to run forever.
   
   This patch does three things to improve robustness:
   
   1) Fix the specific code path in RunAllFullyWidget that was swallowing 
InterruptedException in the problematic test.
   
   2) Add a non-interrupt-based cancellation mechanism: a stop() call that 
throws an exception into the main thread's kernel manipulation queue. This is 
useful as a failsafe in case the interrupt gets lost in some other code path 
that may not have been discovered yet.
   
   3) Update MSQTestBase to wait for workers to exit before moving on to the 
next test, and fail if they don't exit within 10 seconds. This is preferable to 
the tests running forever and possibly polluting the shared executor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to