rzo1 commented on issue #8590: URL: https://github.com/apache/storm/issues/8590#issuecomment-4421116007
Thanks @mwkang — well-structured proposal, the diagrams make the problem and the fix immediately legible. Default-off opt-in plus the binary trigger argument (no cooldown needed because "almost balanced" can't fire it) is the right framing, and the round-robin-across-topologies choice is a nice touch. A handful of things I'd want clarified or expanded before this moves to a PR: 1. The proposal is scoped to `EvenScheduler` / `DefaultScheduler`, but the de-facto modern scheduler in Storm is `ResourceAwareScheduler`, which uses `needsSchedulingRas` and an entirely different placement engine. Most non-trivial deployments run RAS, so it'd be good to state explicitly whether (a) RAS users are intentionally out of scope for this change, or (b) a parallel mechanism is envisioned later. Either is fine — but the doc should not leave it ambiguous. 2. Worth being explicit that a "relocation" is a worker JVM restart: brief tuple replay, JIT warmup, possible windowed/stateful bolt churn. The proposal frames the trigger purely in balance terms, but the cost side is real and is exactly why some operators will choose to keep this off. A short "when you would *not* want to enable this" note would help operators decide. 3. Interaction with `IsolationScheduler` / `MultitenantScheduler` is worth verifying explicitly: isolated topologies should never be selected as donors, and a supervisor reserved for an isolation slot shouldn't be considered "idle" just because the isolated topology happens to be down. A test for each case would be reassuring. 4. "Most-loaded supervisor of a topology" — by what measure (worker count vs. executor count) and what's the tie-break when two supervisors are equally loaded? Determinism matters for operators trying to understand why a particular worker moved. The PR should specify it and tests should pin it. 5. The binary-trigger argument convincingly rules out *thrash*, but a supervisor returning from maintenance can still flap on a slow JVM startup or a transient network blip. Moving workers onto a node that disappears 30s later doubles the disruption. A small "supervisor must have been alive for N scheduling rounds before being considered idle-and-available" guard would be cheap insurance — happy to see it as either a third config knob or a fixed conservative default. 6. `needsScheduling` is on a broadly-called path — invoked from `needsSchedulingTopologies` and indirectly by callers that aren't EvenScheduler-aware. Probably fine because RAS uses `needsSchedulingRas`, but worth confirming in the PR description that no other scheduler picks up a surprise "needs rescheduling" signal from the new branch. 7. `nimbus.even.rebalance.idle-supervisor.enabled` mixes dots and dashes. Storm convention is dots only (e.g. `nimbus.even.rebalance.idle.supervisor.enabled`). Minor, but easier to fix now than after the key is documented. 8. "One worker per topology per round-robin iteration" describes the *fairness rule*, but the *aggregate* simultaneous disruption is `min(idle_slots, eligible_topologies)` worker restarts in one scheduling pass. On a "one returning supervisor with 8 slots, 50 topologies" cluster that's 8 simultaneous worker restarts across 8 topologies. Worth quantifying so operators understand the realistic blast radius before they opt in. **Overall take:** the core idea is sound, the safety guards are well-chosen, and the API/config story is appropriately conservative. I'd encourage you to open the PR. The asks before merge from my side would be: explicit scoping wrt RAS, an honest disruption note in the docs, isolation/multitenant interaction tests, and the smaller additions (grace period, deterministic tie-break, config-key naming). The binary-trigger argument and the round-robin policy are the strongest parts of the design — please keep both front-and-center in the PR description so reviewers don't re-litigate them. Only my pov, but others might jump in too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
