Re: [Discussion] Make the scheduler's task selection algorithm pluggable

asquator Tue, 16 Sep 2025 00:58:37 -0700

Thank you for clarification, I see your point now.

Yes, exposing pluggable features means fixing an API, which is confining and 
just hard to do given the current implementation.
We also don't really want to spend our time doing that because there hasn't 
been such use case so far, except for fixing bugs.


So, before even talking about making things pluggable for the community, we 
have to make them pluggable at least for ourselves.
We can do thus:
1. Create the possibility of injection by splitting to components.
2. Inject the single implementation by static, hard-coded bootstrap (no 
toggles).
3. Do incremental changes 1-2 across the scheduler.

Regarding the starvation case, I think we can do as follows:
1. Create the TaskSelector abstraction and make it injectible.
2. Subclass the SchedulerJobRunner to override the default TaskSelector in 
constructor.
3. On merge, replace the default (and hopefuly only) strategy, hard-coded again.

This way we create an internal API for ourselves that can easily evolve, make 
the code cleaner and reduce the complexity of the scheduler.


On Tuesday, September 16th, 2025 at 2:38 AM, Jarek Potiuk <ja...@potiuk.com> 
wrote:

> > A separation of concerns is highly desired for the scheduler and we
>
> should make it BETTER, not WORSE.
>
> Yes. I agree scheduler could be refactored and likely more modularised.
> Step-by-step, with verifying performance and impact it has. You are quite
> right that it has been a bit monolithic. And you are absolutely free to
> propose incremental, well tested changes that will make it better
> modularized. This should be start, not pluggability. There are trade-offs
> between modularity and performance (especially in Python), but yes, likely
> when done well and in small increments, I agree it could be modularized
> without too much impact.
>
> > There is absolutely no way we can make it available for users to override
>
> and use their own implementation - because we will have to support whatever
> someone implemented.
>
> > No. This is absolutely wrong. We won't accept any kind of implementation
>
> that solves some specific edge case - not at all.
>
> You do not seem to account for open-source ways of work. I do not know how
> much experience you have in maintaining open-source projects with tens of
> thousands of users who can "write their own code that plugs into yours".
> When you are maintainer of the open-source component that exposes an API of
> some sort, you will have potentially 1000s of implementations done by
> others and if they have problems with it - there will come with THEIR
> problem to you and demand we solve and diagnose them, because - especially
> in kind of such critical section APIs you either pay huge overhead on
> isolating things (and loose performance) or plugin framework exposes the
> API int the way that it's almost impossible to diagnose and fix problems
> without detailed knowledge of both sides of the implementation - which
> makes it particularly difficult if one side are maintainers (who are
> volunteers essentially) and the other are commercial users who have their
> bosses and production traffic and will essentially demand those
> volunteers to solve their problems ("because you have the API and it should
> work - look i have 2000 lines of code implementing your Plugin API and it
> does not work, you must solve it ASAP, and BTW we cannot show you the
> crucial part of it because it's our IP").
>
> > Pluggability is a good thing so everyone can inject things of their own.
>
>
> Not always. There are cases when it is, and cases when it is not. It really
> depends - like everything in software, there are trade-offs. Allowing
> someone to inject things, means that you have to lock the API - which
> instantly limits your flexibility. This is all but given. Opening for
> plugginable implementations means that you deliberately give up on
> flexibility. Full stop. This is simply a fact. And in this case it means
> that when someone has a problem in their implementation of "critical
> section" plugin, things that I described above simply will happen. So
> no, it's extremely unlikely that we will allow someone to develop their own
> plugins for the scheduler of ours. It's what Ash wrote - yes - they can
> subclass the whole class and implement everything - then they will be
> responsible for everything - anything where potential bug, diagnosis
> and problem solving span both "our" code and "their" code is absolutely
> hard NO. This will simply not be something we - as maintainers - want to
> take responsibility for. And it's a very conscious decision.
>
> > We won't support all kinds of community scheduling strategies in the main
>
> tree, to clarify - we won't support any, except the one working well in all
> cases.
>
> No. We are talking about the necessity of supporting "others" pieces of
> code intertwined with ours. If we open plugin API for us this automatically
> means "we will support you and help you solve problems in the
> implementation of the API you've done". This is not happening as far as I
> am concerned.
>
> > If we test outside of the main repo, we shouldn't care how the strategy
>
> is selected, but inheritance is a messy approach and a pretty bad pattern
> here.
>
> No. It's very clean "implementation" separation between "Apache Airflow
> community maintained code" and "Other maintained code". While technically
> speaking - when you design and completely own your own implementation, this
> is the only reasonable way to allow others to have their own code that they
> are fully responsible for and end-2-end manage and test it.
>
> > Let's focus on solving starvation, and just do the coding right, adhering
>
> to SRP and minimizing the maintenance burden.
>
> I propose a different approach. If you have ideas on modularising Scheduler
> and. an incrementally, fully tested (automatically with regression tests)
> including performance approach nicer modularising Scheduler - feel free.
> But the end goal of that should not be "external pluginnable" scheduler,
> but maybe eventually a way how the proposed starvation preventing algorithm
> might be added to our code - possibly as a selectable option. Let's start
> with code cleaning (maybe - if you can implement and demonstrate that you
> understand and follow all the cases we have now and are capable of
> implementing such change in the way that will be clean, testable and
> performance - we can think about next steps.
>
> One of the things in ASF is "meritocracy". People here aspire to be
> committers and PMC members by following the principle that they are capable
> (and demonstrate) of understanding such aspects of OSS projects as
> maintainability, cooperation and collaboration with contributors and users,
> understanding that "incremental" and slow changes with stability and having
> "community first" approach. And you need to prove all that before you are
> invited to be a committer (and later PMC member if you also prove that you
> can think on a higher level, follow Apache Way and direct the project in
> "product" ways.
>
> Going through such a cleanup exercise first, might be a good way to
> demonstrate it, and then we can think about next steps (but again
> plugginability for "external" code in the scheduler in the way to
> intertwine it with our code is very, very unlikely to happen.
>
> That would be my proposal for you to utilise your ideas best, and with
> community in mind.
>
> J,
>
>
>
>
> On Mon, Sep 15, 2025 at 4:14 PM asquator asqua...@proton.me wrote:
>
> > Some parts got swallowed by the markdown blockquotes. Those reading on the
> > Apache website, please unwrap them.
> >
> > On Tuesday, September 16th, 2025 at 1:08 AM, asquator asqua...@proton.me
> > wrote:
> >
> > > Hello!
> > >
> > > First some updates regarding the #54392 PR:
> > > Contributions to the PR have been halted. See the PR itself for more
> > > information.
> > > A new PR was opened to address the general problem of starvation,
> > > utilizing stored SQL functions/procedures and any reviews are welcome:
> > > https://github.com/apache/airflow/pull/55537
> > >
> > > My position on pluggable scheduler is that every piece of software,
> > > especially complex software must be split into smaller, independent
> > > components which are made pluggable whether internally (bootstrap files) 
> > > or
> > > configurations. It has been said above that the scheduler's code is
> > > exceptionally "complex", and I completely disagree with that. It's not
> > > complex but cumbersome, dirty, overloaded and highly monolithic. We have a
> > > function called _executable_task_instances_to_queued having 355 (!) lines
> > > and 4 (!) levels of nesting. This opposes ANY normal clean code standards
> > > which is kind of... BAD. This is what makes the scheduler "complex",
> > > difficult to change, and difficult for newcomers to step into. This was
> > > just one example, but the entire class is written like that. Sometimes I
> > > have a feeling it has been intentionally sabotaged to look this way, and
> > > it's sad.
> > >
> > > > Roughly speaking the scheduler has three main responsibilities
> > >
> > > Exactly! This is a big problem for the SRP. The scheduler should be a
> > > facade that just triggers different steps, instead of one large
> > > incomprehensible `while True: do_everything()` script as it looks now. IMO
> > > the independent steps should even run asynchronously instead of current
> > > sequential execution. It will both make the code cleaner and produce more
> > > efficient results. One class should not do "three main responsibilities".
> > > Never. Over time the industry requirements will shift towards running
> > > millions and tens of millions of tasks daily, and new solutions will be
> > > required to support these requirements. The way things go today, it will 
> > > be
> > > very hard to introduce global changes. The scheduler code looks "complex"
> > > because it was made so. Inherently it's a very simple logic - query the
> > > tasks, loop over them and log some stuff, we just have too much detail in
> > > one file and it's frustrating. For the sake of the SRP I think we must
> > > split the scheduler one day, and any friction blocking this refactor is
> > > another nail in the project's maintainability coffin.
> > >
> > > A complete refactor will be a hard thing to do, so incremental changes
> > > are much more feasible to introduce. Task selection logic is an important
> > > part that should be taken out to another component. Here we both fix the
> > > starvation and do a good thing for the project instead of burying it even
> > > deeper.
> > >
> > > ---
> > >
> > > Now that we're done with the clean code topic, let's talk about the
> > > maintenance overhead so feared by maintainers.
> > > I claim that plugin architecture does not inherently mean more effort to
> > > support any kind of community implementations.
> > >
> > > > There is absolutely no way we can make it available for users to
> > > > override and use their own implementation - because we will have to 
> > > > support
> > > > whatever someone implemented.
> > >
> > > No. This is absolutely wrong. We won't accept any kind of implementation
> > > that solves some specific edge case - not at all. The main branch will
> > > include just one (at most two) generally accepted and tested
> > > implementations. If someone feels like writing their own version - let 
> > > them
> > > do that in their fork for their business needs.
> > > It should never be in main until it's useful for the entire community.
> > > If someone needs their specific behavior - let them do it, we won't 
> > > support
> > > it as it's in their fork. Plugin architecture means the ability to quickly
> > > change a subcomponent to another one, not the necessity to support all
> > > kinds of plugins. We just define a single API and stick to it. We've been
> > > researching the starvation problem for half a year now and tried all kinds
> > > of fixes. Until the component is pluggable, it was a real pain to check
> > > something new.
> > >
> > > Let's connect it to our case:
> > > We have the #54284 PR which is designed to solve a particular issue
> > > @dstandish described. If this logic solves the problem for them, I have no
> > > objection to their adoption of this strategy as a custom plugin. I don't
> > > see how it can be merged into main, because they did a very particular fix
> > > that won't work for everybody - it will be a burden for the devs, but may
> > > be a salvation for their team. My position here is making it easy for them
> > > to switch to this strategy using plugin architecture, without ever taking
> > > responsibility for their code. My team experienced a similar issue but for
> > > pools instead of DAGs. We've been considering creating a patch like 
> > > #54284,
> > > but we dug deeper and found the root of the problem, so this patch was
> > > never created. I agree, we shouldn't pollute the repo with small patches -
> > > it will be hell.
> > >
> > > We also have the #55537 PR which is designed to solve the issue for
> > > everyone. As this implementation claims to replace the current, optimistic
> > > scheduler (claiming to be "just better"), I think it can certainly coexist
> > > with the optimistic for a release or so. The steps are:
> > > 1. Testing and benchmarks outside the main tree (by enthusiasts)
> > > 2. Merging and wide testing by the community, with the ability to switch
> > > back on failure
> > > 3. Deprecation of the optimistic strategy in case the new strategy is
> > > really "just better"
> > >
> > > To be honest, I don't care at all if the testing is done out of main
> > > (it's reasonable), but IMO the second step is still desirable because we
> > > cannot expect everyone to test their workflows with the new strategy in 
> > > the
> > > fork. It implies switching repos, redeploying the chart and doing many
> > > unnecessary steps. A configuration is much simpler (remember, the new
> > > strategy is in main only after preliminary testing shows good results).
> > > It's just another safety step to decrease the chance of breaking people's
> > > production workflows, as a core component is changed. Regarding 
> > > subclassing
> > > `SchedulerJobRunner` - it's a very bad practice. There's absolutely no
> > > reason to subclass the entire job class to swap one single component. It's
> > > just cumbersome and requires splitting this poor "god class" to even
> > > smaller methods nobody understands. If we decide to NOT test the new
> > > strategy in main but just replace the current one (I say it's less safe,
> > > but possible), then it shouldn't bother us at all ATP - whether it's a
> > > subclass or a configuration - as it will be taken down anyway.
> > > We have to focus on finding a good strategy to become the main one,
> > > benchmark it and understand the implications of switching to it - I hope
> > > #55537 may be a good candidate.
> > >
> > > ---
> > >
> > > Regarding research papers - I don't think it's so hard to find a
> > > strategy that just works for all cases. From an academic viewpoint, we 
> > > have
> > > a very simple case of non-preemptible single-trigger scheduling with
> > > priorities that can be solved with one sort and a linear scan. This is
> > > basically an entry level leetcode problem. The main difficulty was to find
> > > something that works in our case considering:
> > > 1. The code is in Python
> > > 2. The tasks are in SQL
> > > and giving the best performance with fewer network hops.
> > > I can say we had a great progress, and I'll give a broader description
> > > of the new approach we're trying now in a corresponding mail topic later.
> > >
> > > ---
> > >
> > > TL;DR:
> > > A separation of concerns is highly desired for the scheduler and we
> > > should make it BETTER, not WORSE.
> > > Pluggability is a good thing so everyone can inject things of their own.
> > > We won't support all kinds of community scheduling strategies in the
> > > main tree, to clarify - we won't support any, except the one working well
> > > in all cases.
> > > If we test outside of the main repo, we shouldn't care how the strategy
> > > is selected, but inheritance is a messy approach and a pretty bad pattern
> > > here.
> > > Let's focus on solving starvation, and just do the coding right,
> > > adhering to SRP and minimizing the maintenance burden.
> > >
> > > On Monday, September 15th, 2025 at 10:23 PM, Natanel
> > > natanelrud...@gmail.com wrote:
> > >
> > > > Hello.
> > > >
> > > > Me and Asquator have already been through this issue, and we have,
> > > > what we
> > > > think, is a decent implementation of pluggable task selection
> > > > algorithm for
> > > > airflow.
> > > > (which we have implemented here
> >
> > https://github.com/Asquator/airflow/tree/feature/pessimistic-task-fetching-with-window-function
> >
> > > > )
> > > >
> > > > I agree that no perfect solution will ever exist in airflow for all use
> > > > cases, regarding task selection, which is why this is probably a
> > > > necessity
> > > > more than a Nice To Have feature.
> > > >
> > > > In the current way we implemented it, we can have a few pre implemented
> > > > algorithms, that solve different issues, as not all users will
> > > > encounter
> > > > all issues, and by making them pluggable correctly, with a
> > > > configuration,
> > > > we can include the documentation on when to use a specific task
> > > > selection
> > > > algorithm, just like Jarek Potiuk proposed. it will not be
> > > > customizable,
> > > > but rather injectable inside of the airflow-core package.
> > > >
> > > > Of course there are risks that come along with it, like users abusing
> > > > it
> > > > and trying to create a new task selection algorithm for each edge case
> > > > or
> > > > use case they have, which can become hard to maintain and follow,
> > > > however,
> > > > I do not agree that it makes it harder to maintain (in terms of code
> > > > amount), or easier to make mistakes, though, if implemented correctly,
> > > > each
> > > > task selector is independent, and acts as a black box, has a simple
> > > > api,
> > > > and can be interchanged without any code changes, which makes it, in my
> > > > opinion, easier to maintain existing algorithms, and removes the need
> > > > to
> > > > change a single big and sloppy file (as it is now).
> > > > In fact, I am certain that making it pluggable will simplify the
> > > > scheduler
> > > > altogether as now, different parts will be clearly separated in
> > > > different
> > > > files and directories.
> > > >
> > > > Allowing the injectable algorithms, does give more flexibility, and can
> > > > even make adding the new priority weights algorithm quite simple, and
> > > > not
> > > > cause any massive changes.
> > > >
> > > > The main downside is that we have to choose an api very carefully, as
> > > > when
> > > > we add it, it will be exceptionally hard to change it, as it would mean
> > > > changing it in multiple places, and so it would be considered a
> > > > breaking
> > > > change.
> > > >
> > > > On Mon, 1 Sept 2025 at 18:36, Christos Bisias christos...@gmail.com
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > A while back I started a discussion on the mailing list regarding
> > > > > making
> > > > > some changes to the task selection query in order to improve the
> > > > > scheduler's throughput.
> > > > >
> > > > > https://github.com/apache/airflow/pull/54103
> > > > >
> > > > > Another topic came up during that discussion related to task
> > > > > starvation due
> > > > > to the current selection algorithm. There are two open PRs with
> > > > > different
> > > > > fixes for that issue.
> > > > >
> > > > > https://github.com/apache/airflow/pull/54284
> > > > >
> > > > > https://github.com/apache/airflow/pull/53492
> > > > >
> > > > > Everyone has his own needs and it's probable that a good number of
> > > > > users
> > > > > won't experience the starvation issue.
> > > > >
> > > > > Each approach has its own advantages and disadvantages and for that
> > > > > reason
> > > > > it doesn't feel like there is a right or wrong approach here or a
> > > > > single
> > > > > solution for all.
> > > > >
> > > > > There have been papers on task selection algorithms like this one
> > > > >
> > > > > https://ieeexplore.ieee.org/document/9799199
> > > > >
> > > > > I would like to suggest refactoring the scheduler so that the task
> > > > > selection algorithm can be pluggable. The current implementation
> > > > > will be
> > > > > the default. Everyone will be able to configure the path to his own
> > > > > class.
> > > > > That will be the most beneficial to the majority of users.
> > > > >
> > > > > In the future, anyone could create a PR with his implementation and
> > > > > if
> > > > > enough people like it, it could be added to the repo.
> > > > >
> > > > > This has already been done for the priority weights algorithm, so
> > > > > why not
> > > > > in this case as well?
> >
> > https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/priority-weight.html#custom-weight-rule
> >
> > > > > If there is positive feedback on this idea, I would like to
> > > > > implement it.
> > > > >
> > > > > Please let me know what you think. Thank you!
> > > > >
> > > > > Regards,
> > > > > Christos
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Re: [Discussion] Make the scheduler's task selection algorithm pluggable

Reply via email to