If the adaptive scheduler would support all execution modes like Native Applications, Sessions etc including active resource management then I think we could use that all the time. I would love to use one scheduler instead of having 2 options.
Currently however there is a huge gap in functionality between active/passive resource management and from my experience, the active (native) integration is much more convenient for Kubernetes environments. Gyula On Thu, Jan 26, 2023 at 3:13 PM Konstantin Knauf <kna...@apache.org> wrote: > Hi Gyula, > > if the adaptive scheduler supported active resource managers, would there > be any other blocker to migrate to it? I don't know much about the > implementation-side here, but conceptually once we have session mode > support and each Jobs in a session clusters declaris their desired > parallelism (!=infinity) there shouldn't be a big gap to support active > resource managers. Am I missing something, Chesnay? > > Regarding the complexity, I was referring to the procedure that Max > outlines in his ticket around check if slots are available or not and then > triggering scaling operations. The adaptive scheduler already does this and > is more responsive in that regard than an external process would be in my > understanding. > > Cheers, > > Konstantin > > > > Am Do., 26. Jan. 2023 um 15:05 Uhr schrieb Gyula Fóra < > gyula.f...@gmail.com>: > >> Hi Konstantin! >> >> I think the Adaptive Scheduler still will not support Kubernetes Native >> integration and can only be used in standalone mode. This means that the >> operator needs to manage all resources externally, and compute exactly how >> much new slots are needed during rescaling etc. >> >> I think whatever scaling API we build, it should work for both standalone >> and native integration as much as possible. It's not a duplicated effort to >> add it to the standard scheduler as long as the adaptive scheduler does not >> support active resource management. >> >> Also it seems this will not reduce complexity on the operator side, which >> can already do scaling actions by executing an upgrade. >> >> And a side note: the operator supports both native and standalone >> integration (both standard and adaptive scheduler this way) but the bigger >> problem is actually computing the required number of slots and required new >> resources which is much harder than simply using active resource management. >> >> Cheers, >> Gyula >> >> On Thu, Jan 26, 2023 at 2:57 PM Konstantin Knauf <kna...@apache.org> >> wrote: >> >>> Hi Max, >>> >>> it seems to me we are now running in some of the potential duplication >>> of efforts across the standard and adaptive scheduler that Chesnay had >>> mentioned on the original ticket. The issue of having to do a full restart >>> of the Job for rescaling as well as waiting for resources to be available >>> before doing a rescaling operation were some of the main motivations behind >>> introducing the adaptive scheduler. In the adaptive scheduler we can >>> further do things like only to trigger a rescaling operations exactly when >>> a checkpoint was completed to minimize reprocessing. For Jobs with small >>> state size, the downtime during rescaling can already be << 1 second today. >>> >>> Chesnay and David Moravek are currently in the process of drafting two >>> FLIPs that will extend the support of the adaptive scheduler to session >>> mode and will allow clients to change the desired/min/max parallelism of >>> the vertices of a Job during its runtime via the REST API. We currently >>> plan to publish a draft of these FLIPs next week for discussion. Would you >>> consider moving to the adaptive scheduler for the kubernetes operator >>> provided these FLIPs make it? I think, it has the potential to simplify the >>> logic required for rescaling on the operator side quite a bit. >>> >>> Best, >>> >>> Konstantin >>> >>> >>> Am Do., 26. Jan. 2023 um 12:16 Uhr schrieb Maximilian Michels < >>> m...@apache.org>: >>> >>>> Hey ConradJam, >>>> >>>> Thank you for your thoughtful response. It would be great to start >>>> writing >>>> a FLIP for the Rescale API. If you want to take a stab, please go ahead, >>>> I'd be happy to review. I'm sure Gyula or others will also chime in. >>>> >>>> I want to answer your question so we are aligned: >>>> >>>> ● Does scaling work on YARN, or just k8s? >>>> > >>>> >>>> I think it should work for both YARN and K8s. We would have to make >>>> changes >>>> to the drivers (AbstractResourceManagerDriver) which is implemented for >>>> both K8s and YARN. The outlined approach for rescaling does not require >>>> integrating with those systems, just maybe updating how the driver is >>>> used, >>>> so we should be able to make it work across both YARN and K8s. >>>> >>>> ● Rescaling supports Standalone mode? >>>> > >>>> >>>> Yes, I think it should and easily can. We do use a different type of >>>> resource manager (StandaloneResourceManager, not ActiveResourceManager) >>>> but >>>> I think the logic will sit on a higher level where the ResourceManager >>>> implementation is not relevant. >>>> >>>> ● Can we simplify the recovery steps? >>>> > >>>> >>>> For the first version, I would prefer the simple approach of (1) >>>> acquiring >>>> the required slots for rescaling, then (2) trigger a stop with savepoint >>>> (3) resubmit the job with updated parallelisms. What you have in mind >>>> is a >>>> bit more involved but certainly a great optimization, especially when >>>> only >>>> a fraction of the job state needs to be repartitioned. >>>> >>>> Of course, there are many details, such as >>>> > ● At some point we may not be able to use this kind of hot update, and >>>> > still need to restart the job, when this happens, we should prevent >>>> users >>>> > from using rescaling requests >>>> > >>>> >>>> I'm curious to learn more about "hot updates". How would we support >>>> this in >>>> Flink? Would we have to support dynamically repartitioning tasks? I >>>> don't >>>> think Flink supports this yet. For now, restarting the job may be the >>>> best >>>> we can do. >>>> >>>> ● After rescaling is submitted, when we fail, there should be a rollback >>>> > mechanism to roll back to the previous degree of parallelism. >>>> > >>>> >>>> This should not be necessary if all the requirements for rescaling, e.g. >>>> enough task slots, are satisfied by the Rescale API. I'm not even sure >>>> rolling back is an option because we can't guarantee that a rollback >>>> would >>>> always work. >>>> >>>> Thanks, >>>> Max >>>> >>>> On Tue, Jan 24, 2023 at 6:34 AM ConradJam <jam.gz...@gmail.com> wrote: >>>> >>>> > Hello max >>>> > >>>> > Thanks for driving it, I think there is no problem with your previous >>>> > suggestion of [1] FLINK-30773. Here I just put forward some >>>> supplements and >>>> > doubts.I have some suggestions and insights for this >>>> > >>>> > I have experienced the autoscaling of Flink K8S Operator for a part >>>> of the >>>> > time. The current method is to stop the job and modify the >>>> parallelism, >>>> > which will interrupt the business for a long time. I think the >>>> purpose of >>>> > modifying Rescaling Api is to better fit cloud native and reduce the >>>> impact >>>> > of job scaling downtime. >>>> > >>>> > I have tried scaling with less time, and I call this step "hot update >>>> > parallelism" (if there is an available Slots, there is no need to >>>> re-deploy >>>> > the JobManager Or TaskManager on K8S) >>>> > >>>> > Around this topic, I raised the *following questions*: >>>> > ● Does scaling work on YARN, or just k8s? >>>> > ○ I think we can support running on K8S for the first version, and >>>> Yarn >>>> > can be considered later >>>> > ● Rescaling supports Standalone mode? >>>> > ○ I think it can be supported. The essence is just to modify the >>>> > parallelism of job vertices. As for the tuning strategy, it should be >>>> > determined by the external system or K8S Operator >>>> > ● Can we simplify the recovery steps? >>>> > ○ As far as I know, the traditional way to adjust the parallelism >>>> is to >>>> > stop a job and do a Savepoint, and then run the job with the adjusted >>>> > parallelism. If we hide this step in the *JobManager*, it will be an >>>> > important means to reduce the delay. >>>> > >>>> > Of course, there are many details, such as >>>> > ● At some point we may not be able to use this kind of hot update, and >>>> > still need to restart the job, when this happens, we should prevent >>>> users >>>> > from using rescaling requests >>>> > ● After rescaling is submitted, when we fail, there should be a >>>> rollback >>>> > mechanism to roll back to the previous degree of parallelism. >>>> > >>>> > more and more ~ >>>> > >>>> > By the way, because the content may be more, I did not expand more >>>> ideas >>>> > and descriptions here. This proposal modifies the original Rescaling >>>> API. >>>> > I would also like to hear if *@gyula* has some new ideas on this as >>>> it was >>>> > also involved in the development of FLIP-271 >>>> > I am willing to write a FLIP for this purpose to improve and write >>>> some >>>> > ideas with dev Community and then submit it. What do you think about >>>> > starting a discussion for the community? >>>> > >>>> > >>>> > 1. https://issues.apache.org/jira/browse/FLINK-30773 >>>> > >>>> > Best~ >>>> > >>>> > Maximilian Michels <m...@apache.org> 于2023年1月24日周二 01:08写道: >>>> > >>>> > > Hi, >>>> > > >>>> > > The current rescale API appears to be a work in progress. A couple >>>> years >>>> > > ago, we disabled access to the API [1]. >>>> > > >>>> > > I'm looking into this problem as part of working on autoscaling [2] >>>> where >>>> > > we currently require a full restart of the job to apply the >>>> parallelism >>>> > > overrides. This adds additional delay and comes with the caveat >>>> that we >>>> > > don't know whether sufficient resources are available prior to >>>> executing >>>> > > the scaling decision. We obviously do not want to get stuck due to >>>> a lack >>>> > > of resources. So a rescale API would have to ensure enough >>>> resources are >>>> > > available prior to restarting the job. >>>> > > >>>> > > I've created an issue here: >>>> > > https://issues.apache.org/jira/browse/FLINK-30773 >>>> > > >>>> > > Any comments or interest in working on this? >>>> > > >>>> > > -Max >>>> > > >>>> > > [1] https://issues.apache.org/jira/browse/FLINK-12312 >>>> > > [2] >>>> > > >>>> > >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling >>>> > > >>>> > >>>> > >>>> > -- >>>> > Best >>>> > >>>> > ConradJam >>>> > >>>> > >>>> > -- >>>> > Best >>>> > >>>> > ConradJam >>>> > >>>> >>> >>> >>> -- >>> https://twitter.com/snntrable >>> https://github.com/knaufk >>> >> > > -- > https://twitter.com/snntrable > https://github.com/knaufk >