> > The adaptive scheduler only supports streaming jobs. That's the biggest > limitation that probably won't be fixed anytime soon.
Since FLIP-283 [1] has been accepted, I think this limitation might have already been addressed to a certain extent. I'd be completely fine with having a separate scheduler for batch and streaming (maybe we could build a hybrid one at some point that automatically switches between the two). [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-283%3A+Use+adaptive+batch+scheduler+as+default+scheduler+for+batch+jobs On Fri, Jan 27, 2023 at 9:58 AM Chesnay Schepler <ches...@apache.org> wrote: > The adaptive scheduler only supports streaming jobs. That's the biggest > limitation that probably won't be fixed anytime soon. > The goal was though to make the adaptive scheduler the default for > streaming jobs eventually. > it was very much meant as a better version of the default scheduler for > streaming jobs. > > On 26/01/2023 19:06, David Morávek wrote: > > Hi Gyula, > > > > > >> can you please explain why the AdaptiveScheduler is not the default > >> scheduler? > > > > There are still some smaller bits missing. As far as I know, the missing > > parts are: > > > > 1) Local recovery (reusing the already downloaded state files after > restart > > / rescale) > > 2) Support for fine-grained resource management > > 3) Support for the session cluster (Chesnay will be submitting a FLIP for > > this soon) > > > > We're looking into addressing all of these limitations in the short term. > > > > Personally, I'd love to start a discussion about making transitioning the > > AdaptiveScheduler into a default one after those limitations are fixed. > > Being able to eventually deprecate and remove the DefaultScheduler would > > simplify the code-base by a lot since there are many adapters between new > > and old interfaces (eg. SlotPool-related interfaces). > > > > Best, > > D. > > > > On Thu, Jan 26, 2023 at 6:27 PM Gyula Fóra <gyula.f...@gmail.com> wrote: > > > >> Chesnay, > >> > >> Seems like you are suggesting that the Adaptive scheduler does > everything > >> the standard scheduler does and more. > >> > >> I am clearly not an expert on this topic but can you please explain why > the > >> AdaptiveScheduler is not the default scheduler? > >> If it can do everything, why do we even have 2 schedulers? Why not > simply > >> drop the "old" one? > >> > >> That would probably clear up all confusionsthen :) > >> > >> Gyula > >> > >> On Thu, Jan 26, 2023 at 6:23 PM Chesnay Schepler <ches...@apache.org> > >> wrote: > >> > >>> There's the default and reactive mode; nothing else. > >>> At it's core they are the same thing; reactive mode just cranks up the > >>> desired parallelism to infinity and enforces certain assumptions (e.g., > >>> no active resource management). > >>> > >>> The advantage is that the adaptive scheduler can run jobs while not > >>> sufficient resources are available, and scale things up again once they > >>> are available. > >>> This is it's core functionality, but we always intended to extend it > >>> such that users can modify the parallelism at runtime as well. > >>> And since the AS can already rescale jobs (and was purpose-built with > >>> that functionality in mind), this is just a matter of exposing an API > >>> for it. Everything else is already there. > >>> > >>> As a concrete use-case, let's say you have an SLA that says jobs must > >>> not be down longer than X seconds, and a TM just crashed. > >>> If you can absolutely guarantee that your k8s cluster can provision a > >>> new TM within X seconds, no matter what cruel reality has in store for > >>> you, than you /may/ not need it. > >>> If you can't, well then here's a use-case for you. > >>> > >>> > Last time I looked they implemented the same interface and the same > >>> base class. Of course, their behavior is quite different. > >>> > >>> They never shared a base class since day 1. Are you maybe mixing up the > >>> AdaptiveScheduler and AdaptiveBatchScheduler? > >>> > >>> As for FLINK-30773, I think that should be covered. > >>> > >>> On 26/01/2023 17:10, Maximilian Michels wrote: > >>>> Thanks for the explanation. If not for the "reactive mode", what is > >>>> the advantage of the adaptive scheduler? What other modes does it > >>>> support? > >>>> > >>>>> Apart from implementing the same interface the implementations of the > >>> adaptive and default schedulers are separate. > >>>> Last time I looked they implemented the same interface and the same > >>>> base class. Of course, their behavior is quite different. > >>>> > >>>> I'm still very interested in learning about the future FLIPs > >>>> mentioned. Based on the replies, I'm assuming that they will support > >>>> the changes required for > >>>> https://issues.apache.org/jira/browse/FLINK-30773, or at least > provide > >>>> the basis for implementing them. > >>>> > >>>> -Max > >>>> > >>>> On Thu, Jan 26, 2023 at 4:57 PM Chesnay Schepler<ches...@apache.org> > >>> wrote: > >>>>> On 26/01/2023 16:18, Maximilian Michels wrote: > >>>>> > >>>>> I see slightly different goals for the standard and the adaptive > >>>>> scheduler. The adaptive scheduler's goal is to adapt the Flink job > >>>>> according to the available resources. > >>>>> > >>>>> This is really a misconception that we just have to stomp out. > >>>>> > >>>>> This statement only applies to reactive mode, a special mode in which > >>> the adaptive scheduler (AS) can run in where active resource management > >> is > >>> not supported since requesting infinite resources from k8s doesn't > really > >>> make sense. > >>>>> The AS itself can work perfectly fine with active resource > management, > >>> and has no effect on how the RM talks to k8s. It can just keep the job > >>> running in cases where less than desired (==user-provided parallelism) > >>> resources are provided by k8s (possibly temporarily). > >>>>> On 26/01/2023 16:18, Maximilian Michels wrote: > >>>>> > >>>>> After > >>>>> all, both schedulers share the same super class > >>>>> > >>>>> Apart from implementing the same interface the implementations of the > >>> adaptive and default schedulers are separate. > >>> > >>> > >