Re: [VOTE] FLIP-586: Composable Parallelism Alignment Modes for Flink Autoscaler

Dennis-Mircea Ciupitu Mon, 15 Jun 2026 04:32:43 -0700

Hi everyone,

As it has been nearly two weeks since the voting thread was opened, I'll be
closing it now. The results will be announced in a separate thread shortly.


Thank you,
Dennis


On Fri, Jun 12, 2026 at 4:06 PM Vivek Jhaver <[email protected]> wrote:

> Hi Dennis,
>
> Thank you for simplifying the FLIP based on the feedback. I reviewed the
> updated FLIP and it's easy to reason about and removes earlier complexity.
> Much better than before.
>
> +1 (non-binding)
>
> Thanks,
> Vivek Jhaver
>
> On Thu, Jun 11, 2026 at 3:19 PM Gyula Fóra <[email protected]> wrote:
>
> > Thanks Dennis for the update and Vivek for the feedback.
> > I think this is much simpler now and also improves the out of the box
> > experience instead of complicating it further.
> >
> > +1 (binding)
> >
> > Gyula
> >
> > On Thu, Jun 11, 2026 at 11:27 AM Dennis-Mircea Ciupitu <
> > [email protected]> wrote:
> >
> > > Hi Gyula,
> > >
> > > Thanks, we are fully aligned now. I dropped the ADVANCED composable
> part
> > > and refined the modes instead, and I updated the FLIP [1] again to make
> > the
> > > new shape crystal clear. I updated the PR [2] as well to reflect latest
> > > changes.
> > >
> > > The front door is now three simple modes: BALANCED (default),
> > > EVENLY_SPREAD, and OFF, with no primary plus fallback composition. Each
> > > mode runs a single search per scaling direction and, when it finds no
> > > aligned value, keeps the autoscaler's computed target instead of
> blocking
> > > the scale. So the default never stalls a scale, and it leans toward
> > > responsiveness to load spikes over squeezing resources, which should
> make
> > > it a good fit for most users out of the box.
> > >
> > > Extensibility now lives in a plugin SPI rather than an ADVANCED mode.
> > > Custom modes are discovered as plugins (ServiceLoader in standalone,
> > > PluginManager in the operator) and selected by name, following the
> > Scaling
> > > Executor Plugin SPI convention. That keeps the everyday config small
> and
> > is
> > > exactly why ADVANCED is gone. Anyone pinned to the deprecated key keeps
> > its
> > > exact previous behavior, with the note that these are going to be
> removed
> > > in a future version.
> > >
> > > Thanks,
> > > Dennis
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/18nh9D1fYqErky12WHznzSufXzt6rkm3tGbNkcfgTTvE/edit?usp=sharing
> > > [2] https://github.com/apache/flink-kubernetes-operator/pull/1088
> > >
> > >
> > > On Mon, Jun 8, 2026 at 12:08 PM Gyula Fóra <[email protected]>
> wrote:
> > >
> > > > Hey!
> > > > Based on this I like the 3 options:
> > > > BALANCED, STRICT_DIVISOR, MAXIMIZE_UTILISATION
> > > >
> > > > But I would personally avoid adding the "ADVANCED" composable part, I
> > > would
> > > > rather refine the 3 options to have more reasonable fallback
> behaviour
> > if
> > > > necessary. I think this is already complicated enough that most
> people
> > > wont
> > > > touch it and just go with the default (and the default should
> > > > preferably work well for must, even if we have to change/tweak it a
> > > > little).
> > > >
> > > > Especially if we want to make it pluggable later then we should not
> > have
> > > an
> > > > ADVANCED option.
> > > >
> > > > Gyula
> > > >
> > > > On Sun, Jun 7, 2026 at 12:16 PM Dennis-Mircea Ciupitu <
> > > > [email protected]> wrote:
> > > >
> > > > > Thanks Vivek for the detailed review, and thanks Gyula for weighing
> > in.
> > > > You
> > > > > are both right on the central point. The flat mode plus fallback
> > > surface
> > > > is
> > > > > harder to reason about than it should be, and EVENLY_SPREAD should
> > not
> > > be
> > > > > silently redefined. I want to revise the FLIP rather than defend
> the
> > > > > current shape, and I think the result is genuinely better.
> > > > >
> > > > > The reframing is: The alignment behavior has three real degrees of
> > > > freedom
> > > > > under the hood - where we search (within the current-to-target
> range,
> > > or
> > > > > above the target), how strict we are about accepting a parallelism
> > > (exact
> > > > > divisor, load reducing, or any non-empty), and what we do on
> failure
> > > > > (block, or relax). The original proposal exposed these as a mode
> > times
> > > > > fallback cross-product, which is where the complexity came from.
> Most
> > > of
> > > > > that cross-product is not actually meaningful (many combinations
> are
> > > > > redundant or no-ops), and you are right that asking users to
> navigate
> > > it
> > > > > does not serve them.
> > > > >
> > > > > Revised design, a small named front door over those axes:
> > > > >
> > > > >    - BALANCED (default) - Avoid skew, and if no clean divisor
> exists,
> > > > scale
> > > > >    anyway rather than get stuck. This reproduces today's default
> > > behavior
> > > > >    exactly.
> > > > >    - STRICT_DIVISOR - Only scale to an exact divisor between
> current
> > > and
> > > > >    target. If none exists, do not scale and emit an event.
> > > > >    - MAXIMIZE_UTILISATION - Always reduce per-subtask load above
> the
> > > > >    target, snapping to a divisor when reachable. Unchanged from
> > today.
> > > > >
> > > > > Three everyday modes, which I believe is the "3 reasonable modes"
> you
> > > > asked
> > > > > for, Gyula. The parallelism alignment schema itself is genuinely
> > > complex
> > > > (a
> > > > > target parallelism interacts with the key group or partition count
> > > across
> > > > > two search regions and several acceptance policies), so for the
> small
> > > set
> > > > > of advanced users who need to tune it there is an optional ADVANCED
> > > mode.
> > > > > ADVANCED composes the existing, already-proven built-in strategies
> > as a
> > > > > primary plus an optional fallback, with a validator that rejects
> the
> > > > > redundant and self-referential combinations at config load. It
> > > > deliberately
> > > > > keeps the current strategies rather than a reduced model, because
> > that
> > > is
> > > > > exactly the expressiveness an advanced user reaches for. So the
> > > confusing
> > > > > cross-product is gone from the front door, the complete schema
> stays
> > > > > available to those who need it, and the few meaningful compositions
> > are
> > > > > reachable and guarded.
> > > > >
> > > > > On the EVENLY_SPREAD migration, which I think is the most important
> > > fix.
> > > > I
> > > > > am retiring the EVENLY_SPREAD token rather than redefining it.
> > Existing
> > > > > configs that pin EVENLY_SPREAD keep mapping to BALANCED, the
> > algorithm
> > > > that
> > > > > string actually had, through the old key kept as a deprecated
> option.
> > > > > Anyone who wants the new exact-only behavior uses STRICT_DIVISOR.
> > That
> > > > > removes the silent behavior change entirely.
> > > > >
> > > > > On JobVertexScaler, you are right that it had taken on too much. I
> > > > > extracted the alignment logic into a dedicated alignment package
> (it
> > > > shrank
> > > > > JobVertexScaler from ~1270 to ~660 lines), which also gives the
> > > strategy
> > > > > resolution and the validator a clean home and removes the
> duplicated
> > > > > scale-up and scale-down paths. The built-in strategies now sit
> behind
> > > an
> > > > > @Experimental AlignmentStrategy interface, so a pluggable
> > > custom-strategy
> > > > > loader is a clean follow-up FLIP later rather than hard-coded
> search
> > > > logic
> > > > > now. If we do ship that loader, it would follow the same
> > ServiceLoader
> > > > > plugin convention as FLIP-575 (Scaling Executor Plugin SPI) rather
> > > than a
> > > > > new mechanism, and it is complementary to FLIP-575 since alignment
> > is a
> > > > > per-vertex step inside the parallelism computation while FLIP-575
> > > > > intercepts the final decisions.
> > > > >
> > > > > On the config key naming you raised on the discussion thread,
> Gyula,
> > I
> > > > > agree the current key is poor. Rather than key-group-alignment, I
> > went
> > > > one
> > > > > step further to job.autoscaler.scaling.alignment.mode (with the
> > > advanced
> > > > > primary and fallback keys under the same scaling.alignment prefix),
> > > > keeping
> > > > > the old key as a fallback so existing configs keep working. The
> > reason
> > > is
> > > > > that the feature aligns parallelism to key groups or source
> > partitions
> > > > > equally, so a neutral "alignment" name fits better than
> > > > > "key-group-alignment", and it matches the new AlignmentMode and
> > > > > AlignmentStrategy types.
> > > > >
> > > > > I created a draft redesigned FLIP doc [1] and updated the draft PR
> > [2]
> > > > > reflecting all of the above. Does this direction address the
> > concerns?
> > > > >
> > > > > Best regards,
> > > > > Dennis
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/18nh9D1fYqErky12WHznzSufXzt6rkm3tGbNkcfgTTvE/edit?usp=sharing
> > > > > [2] https://github.com/apache/flink-kubernetes-operator/pull/1088
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jun 5, 2026 at 9:43 PM <[email protected]> wrote:
> > > > >
> > > > > > Thanks Vivek,
> > > > > >
> > > > > > I am inclined to agree here that making the config complex this
> way
> > > > > > doesn’t really serve most users. If we could create 3 reasonable
> > > modes
> > > > > that
> > > > > > would cover most use cases that would be best.
> > > > > >
> > > > > > Cheers
> > > > > > Gyula
> > > > > >
> > > > > > Sent from my iPhone
> > > > > >
> > > > > > > On 5 Jun 2026, at 16:06, Vivek Jhaver <[email protected]>
> > > wrote:
> > > > > > >
> > > > > > > Vivek
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] FLIP-586: Composable Parallelism Alignment Modes for Flink Autoscaler

Reply via email to