Hi Dennis,

Thank you for simplifying the FLIP based on the feedback. I reviewed the
updated FLIP and it's easy to reason about and removes earlier complexity.
Much better than before.

+1 (non-binding)

Thanks,
Vivek Jhaver

On Thu, Jun 11, 2026 at 3:19 PM Gyula Fóra <[email protected]> wrote:

> Thanks Dennis for the update and Vivek for the feedback.
> I think this is much simpler now and also improves the out of the box
> experience instead of complicating it further.
>
> +1 (binding)
>
> Gyula
>
> On Thu, Jun 11, 2026 at 11:27 AM Dennis-Mircea Ciupitu <
> [email protected]> wrote:
>
> > Hi Gyula,
> >
> > Thanks, we are fully aligned now. I dropped the ADVANCED composable part
> > and refined the modes instead, and I updated the FLIP [1] again to make
> the
> > new shape crystal clear. I updated the PR [2] as well to reflect latest
> > changes.
> >
> > The front door is now three simple modes: BALANCED (default),
> > EVENLY_SPREAD, and OFF, with no primary plus fallback composition. Each
> > mode runs a single search per scaling direction and, when it finds no
> > aligned value, keeps the autoscaler's computed target instead of blocking
> > the scale. So the default never stalls a scale, and it leans toward
> > responsiveness to load spikes over squeezing resources, which should make
> > it a good fit for most users out of the box.
> >
> > Extensibility now lives in a plugin SPI rather than an ADVANCED mode.
> > Custom modes are discovered as plugins (ServiceLoader in standalone,
> > PluginManager in the operator) and selected by name, following the
> Scaling
> > Executor Plugin SPI convention. That keeps the everyday config small and
> is
> > exactly why ADVANCED is gone. Anyone pinned to the deprecated key keeps
> its
> > exact previous behavior, with the note that these are going to be removed
> > in a future version.
> >
> > Thanks,
> > Dennis
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/18nh9D1fYqErky12WHznzSufXzt6rkm3tGbNkcfgTTvE/edit?usp=sharing
> > [2] https://github.com/apache/flink-kubernetes-operator/pull/1088
> >
> >
> > On Mon, Jun 8, 2026 at 12:08 PM Gyula Fóra <[email protected]> wrote:
> >
> > > Hey!
> > > Based on this I like the 3 options:
> > > BALANCED, STRICT_DIVISOR, MAXIMIZE_UTILISATION
> > >
> > > But I would personally avoid adding the "ADVANCED" composable part, I
> > would
> > > rather refine the 3 options to have more reasonable fallback behaviour
> if
> > > necessary. I think this is already complicated enough that most people
> > wont
> > > touch it and just go with the default (and the default should
> > > preferably work well for must, even if we have to change/tweak it a
> > > little).
> > >
> > > Especially if we want to make it pluggable later then we should not
> have
> > an
> > > ADVANCED option.
> > >
> > > Gyula
> > >
> > > On Sun, Jun 7, 2026 at 12:16 PM Dennis-Mircea Ciupitu <
> > > [email protected]> wrote:
> > >
> > > > Thanks Vivek for the detailed review, and thanks Gyula for weighing
> in.
> > > You
> > > > are both right on the central point. The flat mode plus fallback
> > surface
> > > is
> > > > harder to reason about than it should be, and EVENLY_SPREAD should
> not
> > be
> > > > silently redefined. I want to revise the FLIP rather than defend the
> > > > current shape, and I think the result is genuinely better.
> > > >
> > > > The reframing is: The alignment behavior has three real degrees of
> > > freedom
> > > > under the hood - where we search (within the current-to-target range,
> > or
> > > > above the target), how strict we are about accepting a parallelism
> > (exact
> > > > divisor, load reducing, or any non-empty), and what we do on failure
> > > > (block, or relax). The original proposal exposed these as a mode
> times
> > > > fallback cross-product, which is where the complexity came from. Most
> > of
> > > > that cross-product is not actually meaningful (many combinations are
> > > > redundant or no-ops), and you are right that asking users to navigate
> > it
> > > > does not serve them.
> > > >
> > > > Revised design, a small named front door over those axes:
> > > >
> > > >    - BALANCED (default) - Avoid skew, and if no clean divisor exists,
> > > scale
> > > >    anyway rather than get stuck. This reproduces today's default
> > behavior
> > > >    exactly.
> > > >    - STRICT_DIVISOR - Only scale to an exact divisor between current
> > and
> > > >    target. If none exists, do not scale and emit an event.
> > > >    - MAXIMIZE_UTILISATION - Always reduce per-subtask load above the
> > > >    target, snapping to a divisor when reachable. Unchanged from
> today.
> > > >
> > > > Three everyday modes, which I believe is the "3 reasonable modes" you
> > > asked
> > > > for, Gyula. The parallelism alignment schema itself is genuinely
> > complex
> > > (a
> > > > target parallelism interacts with the key group or partition count
> > across
> > > > two search regions and several acceptance policies), so for the small
> > set
> > > > of advanced users who need to tune it there is an optional ADVANCED
> > mode.
> > > > ADVANCED composes the existing, already-proven built-in strategies
> as a
> > > > primary plus an optional fallback, with a validator that rejects the
> > > > redundant and self-referential combinations at config load. It
> > > deliberately
> > > > keeps the current strategies rather than a reduced model, because
> that
> > is
> > > > exactly the expressiveness an advanced user reaches for. So the
> > confusing
> > > > cross-product is gone from the front door, the complete schema stays
> > > > available to those who need it, and the few meaningful compositions
> are
> > > > reachable and guarded.
> > > >
> > > > On the EVENLY_SPREAD migration, which I think is the most important
> > fix.
> > > I
> > > > am retiring the EVENLY_SPREAD token rather than redefining it.
> Existing
> > > > configs that pin EVENLY_SPREAD keep mapping to BALANCED, the
> algorithm
> > > that
> > > > string actually had, through the old key kept as a deprecated option.
> > > > Anyone who wants the new exact-only behavior uses STRICT_DIVISOR.
> That
> > > > removes the silent behavior change entirely.
> > > >
> > > > On JobVertexScaler, you are right that it had taken on too much. I
> > > > extracted the alignment logic into a dedicated alignment package (it
> > > shrank
> > > > JobVertexScaler from ~1270 to ~660 lines), which also gives the
> > strategy
> > > > resolution and the validator a clean home and removes the duplicated
> > > > scale-up and scale-down paths. The built-in strategies now sit behind
> > an
> > > > @Experimental AlignmentStrategy interface, so a pluggable
> > custom-strategy
> > > > loader is a clean follow-up FLIP later rather than hard-coded search
> > > logic
> > > > now. If we do ship that loader, it would follow the same
> ServiceLoader
> > > > plugin convention as FLIP-575 (Scaling Executor Plugin SPI) rather
> > than a
> > > > new mechanism, and it is complementary to FLIP-575 since alignment
> is a
> > > > per-vertex step inside the parallelism computation while FLIP-575
> > > > intercepts the final decisions.
> > > >
> > > > On the config key naming you raised on the discussion thread, Gyula,
> I
> > > > agree the current key is poor. Rather than key-group-alignment, I
> went
> > > one
> > > > step further to job.autoscaler.scaling.alignment.mode (with the
> > advanced
> > > > primary and fallback keys under the same scaling.alignment prefix),
> > > keeping
> > > > the old key as a fallback so existing configs keep working. The
> reason
> > is
> > > > that the feature aligns parallelism to key groups or source
> partitions
> > > > equally, so a neutral "alignment" name fits better than
> > > > "key-group-alignment", and it matches the new AlignmentMode and
> > > > AlignmentStrategy types.
> > > >
> > > > I created a draft redesigned FLIP doc [1] and updated the draft PR
> [2]
> > > > reflecting all of the above. Does this direction address the
> concerns?
> > > >
> > > > Best regards,
> > > > Dennis
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/18nh9D1fYqErky12WHznzSufXzt6rkm3tGbNkcfgTTvE/edit?usp=sharing
> > > > [2] https://github.com/apache/flink-kubernetes-operator/pull/1088
> > > >
> > > >
> > > >
> > > > On Fri, Jun 5, 2026 at 9:43 PM <[email protected]> wrote:
> > > >
> > > > > Thanks Vivek,
> > > > >
> > > > > I am inclined to agree here that making the config complex this way
> > > > > doesn’t really serve most users. If we could create 3 reasonable
> > modes
> > > > that
> > > > > would cover most use cases that would be best.
> > > > >
> > > > > Cheers
> > > > > Gyula
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > > On 5 Jun 2026, at 16:06, Vivek Jhaver <[email protected]>
> > wrote:
> > > > > >
> > > > > > Vivek
> > > > >
> > > >
> > >
> >
>

Reply via email to