Re: [I] [PROPOSAL] EvenScheduler: opt-in round-robin rebalance to populate returning idle supervisors (storm)

via GitHub Mon, 11 May 2026 06:25:45 -0700


rzo1 commented on issue #8590:
URL: https://github.com/apache/storm/issues/8590#issuecomment-4421116007


   Thanks @mwkang — well-structured proposal, the diagrams make the problem and 
the fix immediately legible. Default-off opt-in plus the binary trigger 
argument (no cooldown needed because "almost balanced" can't fire it) is the 
right framing, and the round-robin-across-topologies choice is a nice touch. A 
handful of things I'd want clarified or expanded before this moves to a PR:
   
   1. The proposal is scoped to `EvenScheduler` / `DefaultScheduler`, but the 
de-facto modern scheduler in Storm is `ResourceAwareScheduler`, which uses 
`needsSchedulingRas` and an entirely different placement engine. Most 
non-trivial deployments run RAS, so it'd be good to state explicitly whether 
(a) RAS users are intentionally out of scope for this change, or (b) a parallel 
mechanism is envisioned later. Either is fine — but the doc should not leave it 
ambiguous.
   
   2. Worth being explicit that a "relocation" is a worker JVM restart: brief 
tuple replay, JIT warmup, possible windowed/stateful bolt churn. The proposal 
frames the trigger purely in balance terms, but the cost side is real and is 
exactly why some operators will choose to keep this off. A short "when you 
would *not* want to enable this" note would help operators decide.
   
   3. Interaction with `IsolationScheduler` / `MultitenantScheduler` is worth 
verifying explicitly: isolated topologies should never be selected as donors, 
and a supervisor reserved for an isolation slot shouldn't be considered "idle" 
just because the isolated topology happens to be down. A test for each case 
would be reassuring.
   
   4. "Most-loaded supervisor of a topology" — by what measure (worker count 
vs. executor count) and what's the tie-break when two supervisors are equally 
loaded? Determinism matters for operators trying to understand why a particular 
worker moved. The PR should specify it and tests should pin it.
   
   5. The binary-trigger argument convincingly rules out *thrash*, but a 
supervisor returning from maintenance can still flap on a slow JVM startup or a 
transient network blip. Moving workers onto a node that disappears 30s later 
doubles the disruption. A small "supervisor must have been alive for N 
scheduling rounds before being considered idle-and-available" guard would be 
cheap insurance — happy to see it as either a third config knob or a fixed 
conservative default.
   
   6. `needsScheduling` is on a broadly-called path — invoked from 
`needsSchedulingTopologies` and indirectly by callers that aren't 
EvenScheduler-aware. Probably fine because RAS uses `needsSchedulingRas`, but 
worth confirming in the PR description that no other scheduler picks up a 
surprise "needs rescheduling" signal from the new branch.
   
   7. `nimbus.even.rebalance.idle-supervisor.enabled` mixes dots and dashes. 
Storm convention is dots only (e.g. 
`nimbus.even.rebalance.idle.supervisor.enabled`). Minor, but easier to fix now 
than after the key is documented.
   
   8. "One worker per topology per round-robin iteration" describes the 
*fairness rule*, but the *aggregate* simultaneous disruption is 
`min(idle_slots, eligible_topologies)` worker restarts in one scheduling pass. 
On a "one returning supervisor with 8 slots, 50 topologies" cluster that's 8 
simultaneous worker restarts across 8 topologies. Worth quantifying so 
operators understand the realistic blast radius before they opt in.
   
   **Overall take:** the core idea is sound, the safety guards are well-chosen, 
and the API/config story is appropriately conservative. I'd encourage you to 
open the PR. The asks before merge from my side would be: explicit scoping wrt 
RAS, an honest disruption note in the docs, isolation/multitenant interaction 
tests, and the smaller additions (grace period, deterministic tie-break, 
config-key naming). The binary-trigger argument and the round-robin policy are 
the strongest parts of the design — please keep both front-and-center in the PR 
description so reviewers don't re-litigate them.
   
   Only my pov, but others might jump in too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [PROPOSAL] EvenScheduler: opt-in round-robin rebalance to populate returning idle supervisors (storm)

Reply via email to