Hi Dennis, Thank you for simplifying the FLIP based on the feedback. I reviewed the updated FLIP and it's easy to reason about and removes earlier complexity. Much better than before.
+1 (non-binding) Thanks, Vivek Jhaver On Thu, Jun 11, 2026 at 3:19 PM Gyula Fóra <[email protected]> wrote: > Thanks Dennis for the update and Vivek for the feedback. > I think this is much simpler now and also improves the out of the box > experience instead of complicating it further. > > +1 (binding) > > Gyula > > On Thu, Jun 11, 2026 at 11:27 AM Dennis-Mircea Ciupitu < > [email protected]> wrote: > > > Hi Gyula, > > > > Thanks, we are fully aligned now. I dropped the ADVANCED composable part > > and refined the modes instead, and I updated the FLIP [1] again to make > the > > new shape crystal clear. I updated the PR [2] as well to reflect latest > > changes. > > > > The front door is now three simple modes: BALANCED (default), > > EVENLY_SPREAD, and OFF, with no primary plus fallback composition. Each > > mode runs a single search per scaling direction and, when it finds no > > aligned value, keeps the autoscaler's computed target instead of blocking > > the scale. So the default never stalls a scale, and it leans toward > > responsiveness to load spikes over squeezing resources, which should make > > it a good fit for most users out of the box. > > > > Extensibility now lives in a plugin SPI rather than an ADVANCED mode. > > Custom modes are discovered as plugins (ServiceLoader in standalone, > > PluginManager in the operator) and selected by name, following the > Scaling > > Executor Plugin SPI convention. That keeps the everyday config small and > is > > exactly why ADVANCED is gone. Anyone pinned to the deprecated key keeps > its > > exact previous behavior, with the note that these are going to be removed > > in a future version. > > > > Thanks, > > Dennis > > > > [1] > > > > > https://docs.google.com/document/d/18nh9D1fYqErky12WHznzSufXzt6rkm3tGbNkcfgTTvE/edit?usp=sharing > > [2] https://github.com/apache/flink-kubernetes-operator/pull/1088 > > > > > > On Mon, Jun 8, 2026 at 12:08 PM Gyula Fóra <[email protected]> wrote: > > > > > Hey! > > > Based on this I like the 3 options: > > > BALANCED, STRICT_DIVISOR, MAXIMIZE_UTILISATION > > > > > > But I would personally avoid adding the "ADVANCED" composable part, I > > would > > > rather refine the 3 options to have more reasonable fallback behaviour > if > > > necessary. I think this is already complicated enough that most people > > wont > > > touch it and just go with the default (and the default should > > > preferably work well for must, even if we have to change/tweak it a > > > little). > > > > > > Especially if we want to make it pluggable later then we should not > have > > an > > > ADVANCED option. > > > > > > Gyula > > > > > > On Sun, Jun 7, 2026 at 12:16 PM Dennis-Mircea Ciupitu < > > > [email protected]> wrote: > > > > > > > Thanks Vivek for the detailed review, and thanks Gyula for weighing > in. > > > You > > > > are both right on the central point. The flat mode plus fallback > > surface > > > is > > > > harder to reason about than it should be, and EVENLY_SPREAD should > not > > be > > > > silently redefined. I want to revise the FLIP rather than defend the > > > > current shape, and I think the result is genuinely better. > > > > > > > > The reframing is: The alignment behavior has three real degrees of > > > freedom > > > > under the hood - where we search (within the current-to-target range, > > or > > > > above the target), how strict we are about accepting a parallelism > > (exact > > > > divisor, load reducing, or any non-empty), and what we do on failure > > > > (block, or relax). The original proposal exposed these as a mode > times > > > > fallback cross-product, which is where the complexity came from. Most > > of > > > > that cross-product is not actually meaningful (many combinations are > > > > redundant or no-ops), and you are right that asking users to navigate > > it > > > > does not serve them. > > > > > > > > Revised design, a small named front door over those axes: > > > > > > > > - BALANCED (default) - Avoid skew, and if no clean divisor exists, > > > scale > > > > anyway rather than get stuck. This reproduces today's default > > behavior > > > > exactly. > > > > - STRICT_DIVISOR - Only scale to an exact divisor between current > > and > > > > target. If none exists, do not scale and emit an event. > > > > - MAXIMIZE_UTILISATION - Always reduce per-subtask load above the > > > > target, snapping to a divisor when reachable. Unchanged from > today. > > > > > > > > Three everyday modes, which I believe is the "3 reasonable modes" you > > > asked > > > > for, Gyula. The parallelism alignment schema itself is genuinely > > complex > > > (a > > > > target parallelism interacts with the key group or partition count > > across > > > > two search regions and several acceptance policies), so for the small > > set > > > > of advanced users who need to tune it there is an optional ADVANCED > > mode. > > > > ADVANCED composes the existing, already-proven built-in strategies > as a > > > > primary plus an optional fallback, with a validator that rejects the > > > > redundant and self-referential combinations at config load. It > > > deliberately > > > > keeps the current strategies rather than a reduced model, because > that > > is > > > > exactly the expressiveness an advanced user reaches for. So the > > confusing > > > > cross-product is gone from the front door, the complete schema stays > > > > available to those who need it, and the few meaningful compositions > are > > > > reachable and guarded. > > > > > > > > On the EVENLY_SPREAD migration, which I think is the most important > > fix. > > > I > > > > am retiring the EVENLY_SPREAD token rather than redefining it. > Existing > > > > configs that pin EVENLY_SPREAD keep mapping to BALANCED, the > algorithm > > > that > > > > string actually had, through the old key kept as a deprecated option. > > > > Anyone who wants the new exact-only behavior uses STRICT_DIVISOR. > That > > > > removes the silent behavior change entirely. > > > > > > > > On JobVertexScaler, you are right that it had taken on too much. I > > > > extracted the alignment logic into a dedicated alignment package (it > > > shrank > > > > JobVertexScaler from ~1270 to ~660 lines), which also gives the > > strategy > > > > resolution and the validator a clean home and removes the duplicated > > > > scale-up and scale-down paths. The built-in strategies now sit behind > > an > > > > @Experimental AlignmentStrategy interface, so a pluggable > > custom-strategy > > > > loader is a clean follow-up FLIP later rather than hard-coded search > > > logic > > > > now. If we do ship that loader, it would follow the same > ServiceLoader > > > > plugin convention as FLIP-575 (Scaling Executor Plugin SPI) rather > > than a > > > > new mechanism, and it is complementary to FLIP-575 since alignment > is a > > > > per-vertex step inside the parallelism computation while FLIP-575 > > > > intercepts the final decisions. > > > > > > > > On the config key naming you raised on the discussion thread, Gyula, > I > > > > agree the current key is poor. Rather than key-group-alignment, I > went > > > one > > > > step further to job.autoscaler.scaling.alignment.mode (with the > > advanced > > > > primary and fallback keys under the same scaling.alignment prefix), > > > keeping > > > > the old key as a fallback so existing configs keep working. The > reason > > is > > > > that the feature aligns parallelism to key groups or source > partitions > > > > equally, so a neutral "alignment" name fits better than > > > > "key-group-alignment", and it matches the new AlignmentMode and > > > > AlignmentStrategy types. > > > > > > > > I created a draft redesigned FLIP doc [1] and updated the draft PR > [2] > > > > reflecting all of the above. Does this direction address the > concerns? > > > > > > > > Best regards, > > > > Dennis > > > > > > > > [1] > > > > > > > > > > > > > > https://docs.google.com/document/d/18nh9D1fYqErky12WHznzSufXzt6rkm3tGbNkcfgTTvE/edit?usp=sharing > > > > [2] https://github.com/apache/flink-kubernetes-operator/pull/1088 > > > > > > > > > > > > > > > > On Fri, Jun 5, 2026 at 9:43 PM <[email protected]> wrote: > > > > > > > > > Thanks Vivek, > > > > > > > > > > I am inclined to agree here that making the config complex this way > > > > > doesn’t really serve most users. If we could create 3 reasonable > > modes > > > > that > > > > > would cover most use cases that would be best. > > > > > > > > > > Cheers > > > > > Gyula > > > > > > > > > > Sent from my iPhone > > > > > > > > > > > On 5 Jun 2026, at 16:06, Vivek Jhaver <[email protected]> > > wrote: > > > > > > > > > > > > Vivek > > > > > > > > > > > > > > >
