gianm opened a new pull request, #18931: URL: https://github.com/apache/druid/pull/18931
Patch #18095 replaced various previously-existing worker cancellation mechanisms with an interrupt of the worker thread. Unfortunately, in debugging some stuck tests, it has been revealed that there is at least one scenario where an InterruptedException could be swallowed when the main worker thread handles a new work order. This defeated the cancellation mechanism, leading the worker to run forever. This patch does three things to improve robustness: 1) Fix the specific code path in RunAllFullyWidget that was swallowing InterruptedException in the problematic test. 2) Add a non-interrupt-based cancellation mechanism: a stop() call that throws an exception into the main thread's kernel manipulation queue. This is useful as a failsafe in case the interrupt gets lost in some other code path that may not have been discovered yet. 3) Update MSQTestBase to wait for workers to exit before moving on to the next test, and fail if they don't exit within 10 seconds. This is preferable to the tests running forever and possibly polluting the shared executor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
