Re: Reworking the Rescale API

Konstantin Knauf Thu, 26 Jan 2023 05:57:35 -0800

Hi Max,

it seems to me we are now running in some of the potential duplication of
efforts across the standard and adaptive scheduler that Chesnay had
mentioned on the original ticket. The issue of having to do a full restart
of the Job for rescaling as well as waiting for resources to be available
before doing a rescaling operation were some of the main motivations behind
introducing the adaptive scheduler. In the adaptive scheduler we can
further do things like only to trigger a rescaling operations exactly when
a checkpoint was completed to minimize reprocessing. For Jobs with small
state size, the downtime during rescaling can already be << 1 second today.


Chesnay and David Moravek are currently in the process of drafting two
FLIPs that will extend the support of the adaptive scheduler to session
mode and will allow clients to change the desired/min/max parallelism of
the vertices of a Job during its runtime via the REST API. We currently
plan to publish a draft of these FLIPs next week for discussion. Would you
consider moving to the adaptive scheduler for the kubernetes operator
provided these FLIPs make it? I think, it has the potential to simplify the
logic required for rescaling on the operator side quite a bit.

Best,

Konstantin


Am Do., 26. Jan. 2023 um 12:16 Uhr schrieb Maximilian Michels <
m...@apache.org>:

> Hey ConradJam,
>
> Thank you for your thoughtful response. It would be great to start writing
> a FLIP for the Rescale API. If you want to take a stab, please go ahead,
> I'd be happy to review. I'm sure Gyula or others will also chime in.
>
> I want to answer your question so we are aligned:
>
> ● Does scaling work on YARN, or just k8s?
> >
>
> I think it should work for both YARN and K8s. We would have to make changes
> to the drivers (AbstractResourceManagerDriver) which is implemented for
> both K8s and YARN. The outlined approach for rescaling does not require
> integrating with those systems, just maybe updating how the driver is used,
> so we should be able to make it work across both YARN and K8s.
>
> ● Rescaling supports Standalone mode?
> >
>
> Yes, I think it should and easily can. We do use a different type of
> resource manager (StandaloneResourceManager, not ActiveResourceManager) but
> I think the logic will sit on a higher level where the ResourceManager
> implementation is not relevant.
>
> ● Can we simplify the recovery steps?
> >
>
> For the first version, I would prefer the simple approach of (1) acquiring
> the required slots for rescaling, then (2) trigger a stop with savepoint
> (3) resubmit the job with updated parallelisms. What you have in mind is a
> bit more involved but certainly a great optimization, especially when only
> a fraction of the job state needs to be repartitioned.
>
> Of course, there are many details, such as
> > ● At some point we may not be able to use this kind of hot update, and
> > still need to restart the job, when this happens, we should prevent users
> > from using rescaling requests
> >
>
> I'm curious to learn more about "hot updates". How would we support this in
> Flink? Would we have to support dynamically repartitioning tasks? I don't
> think Flink supports this yet. For now, restarting the job may be the best
> we can do.
>
> ● After rescaling is submitted, when we fail, there should be a rollback
> > mechanism to roll back to the previous degree of parallelism.
> >
>
> This should not be necessary if all the requirements for rescaling, e.g.
> enough task slots, are satisfied by the Rescale API. I'm not even sure
> rolling back is an option because we can't guarantee that a rollback would
> always work.
>
> Thanks,
> Max
>
> On Tue, Jan 24, 2023 at 6:34 AM ConradJam <jam.gz...@gmail.com> wrote:
>
> > Hello max
> >
> > Thanks for driving it, I think there is no problem with your previous
> > suggestion of [1] FLINK-30773. Here I just put forward some supplements
> and
> > doubts.I have some suggestions and insights for this
> >
> >  I have experienced the autoscaling of Flink K8S Operator for a part of
> the
> > time. The current method is to stop the job and modify the parallelism,
> > which will interrupt the business for a long time. I think the purpose of
> > modifying Rescaling Api is to better fit cloud native and reduce the
> impact
> > of job scaling downtime.
> >
> > I have tried scaling with less time, and I call this step "hot update
> > parallelism" (if there is an available Slots, there is no need to
> re-deploy
> > the JobManager Or TaskManager on K8S)
> >
> > Around this topic, I raised the *following questions*:
> > ● Does scaling work on YARN, or just k8s?
> >    ○ I think we can support running on K8S for the first version, and
> Yarn
> > can be considered later
> > ● Rescaling supports Standalone mode?
> >    ○ I think it can be supported. The essence is just to modify the
> > parallelism of job vertices. As for the tuning strategy, it should be
> > determined by the external system or K8S Operator
> > ● Can we simplify the recovery steps?
> >    ○ As far as I know, the traditional way to adjust the parallelism is
> to
> > stop a job and do a Savepoint, and then run the job with the adjusted
> > parallelism. If we hide this step in the *JobManager*, it will be an
> > important means to reduce the delay.
> >
> >   Of course, there are many details, such as
> > ● At some point we may not be able to use this kind of hot update, and
> > still need to restart the job, when this happens, we should prevent users
> > from using rescaling requests
> > ● After rescaling is submitted, when we fail, there should be a rollback
> > mechanism to roll back to the previous degree of parallelism.
> >
> > more and more ～
> >
> >   By the way, because the content may be more, I did not expand more
> ideas
> > and descriptions here. This proposal modifies the original Rescaling API.
> > I would also like to hear if  *@gyula* has some new ideas on this as it
> was
> > also involved in the development of FLIP-271
> > I am willing to write a FLIP for this purpose to improve and write some
> > ideas with dev Community and then submit it. What do you think about
> > starting a discussion for the community?
> >
> >
> >    1. https://issues.apache.org/jira/browse/FLINK-30773
> >
> > Best～
> >
> > Maximilian Michels <m...@apache.org> 于2023年1月24日周二 01:08写道：
> >
> > > Hi,
> > >
> > > The current rescale API appears to be a work in progress. A couple
> years
> > > ago, we disabled access to the API [1].
> > >
> > > I'm looking into this problem as part of working on autoscaling [2]
> where
> > > we currently require a full restart of the job to apply the parallelism
> > > overrides. This adds additional delay and comes with the caveat that we
> > > don't know whether sufficient resources are available prior to
> executing
> > > the scaling decision. We obviously do not want to get stuck due to a
> lack
> > > of resources. So a rescale API would have to ensure enough resources
> are
> > > available prior to restarting the job.
> > >
> > > I've created an issue here:
> > > https://issues.apache.org/jira/browse/FLINK-30773
> > >
> > > Any comments or interest in working on this?
> > >
> > > -Max
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-12312
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling
> > >
> >
> >
> > --
> > Best
> >
> > ConradJam
> >
> >
> > --
> > Best
> >
> > ConradJam
> >
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk

Re: Reworking the Rescale API

Reply via email to