kfaraz opened a new pull request, #14385: URL: https://github.com/apache/druid/pull/14385
### Description If the current leader coordinator is asked to stop being leader, the following happens: - The `DruidCoordinator.balancerExec` (used for strategy cost computations) is shutdown - The currently running duty finishes execution normally and no more duties are executed - An exception to this is the `BalanceSegments` duty, which can exit abnormally or even get stuck in the scenarios explained below. #### ✅ Case 1: `BalanceSegments` duty throws exception Typical sequence of events: - Current coordinator stops being leader and `balancerExec` is shutdown - `CostBalancerStrategy.findNewSegmentHomeBalancer()` or any other method is invoked - `computeCost()` tasks are submitted to the executor - Since the executor has already been shutdown, submission of new tasks throws an exception and ends the coordinator run as desired #### ❌ Case 2: `BalanceSegments` duty gets stuck Typical sequence of events: - `BalanceSegments` duty is in progress - `CostBalancerStrategy.findNewSegmentHomeBalancer()` is invoked for some segment - `computeCost()` tasks are submitted to the executor - Current coordinator stops being leader and `balancerExec` is shutdown - Since the `computeCost()` tasks do not handle interrupts, the method waits indefinitely on the task futures #### ✅ Case 3: Change in `balancerComputeThreads` dynamic config A change in this config also results in a shutdown of the `balancerExec`. But this shutdown is never done concurrently with the coordinator duties and thus doesn't cause the coordinator to get stuck. ### Changes - Add a timeout of 1 minute to the `resultFuture.get()`. 1 minute is the typical time for a full coordinator run and is more than enough time for cost computations of a single segment. - Raise an alert if an exception is encountered while computing costs and if the executor has not been shutdown. This is because a shutdown is intentional and does not require an alert. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
