Re: [Discussion] Make the scheduler's task selection algorithm pluggable

asquator Mon, 15 Sep 2025 15:14:03 -0700

Some parts got swallowed by the markdown blockquotes. Those reading on the 
Apache website, please unwrap them.


On Tuesday, September 16th, 2025 at 1:08 AM, asquator <asqua...@proton.me> 
wrote:

> Hello!
>
> First some updates regarding the #54392 PR:
> Contributions to the PR have been halted. See the PR itself for more 
> information.
> A new PR was opened to address the general problem of starvation, utilizing 
> stored SQL functions/procedures and any reviews are welcome:
> https://github.com/apache/airflow/pull/55537
>
> My position on pluggable scheduler is that every piece of software, 
> especially complex software must be split into smaller, independent 
> components which are made pluggable whether internally (bootstrap files) or 
> configurations. It has been said above that the scheduler's code is 
> exceptionally "complex", and I completely disagree with that. It's not 
> complex but cumbersome, dirty, overloaded and highly monolithic. We have a 
> function called _executable_task_instances_to_queued having 355 (!) lines and 
> 4 (!) levels of nesting. This opposes ANY normal clean code standards which 
> is kind of... BAD. This is what makes the scheduler "complex", difficult to 
> change, and difficult for newcomers to step into. This was just one example, 
> but the entire class is written like that. Sometimes I have a feeling it has 
> been intentionally sabotaged to look this way, and it's sad.
>
> > Roughly speaking the scheduler has three main responsibilities
>
>
> Exactly! This is a big problem for the SRP. The scheduler should be a facade 
> that just triggers different steps, instead of one large incomprehensible 
> `while True: do_everything()` script as it looks now. IMO the independent 
> steps should even run asynchronously instead of current sequential execution. 
> It will both make the code cleaner and produce more efficient results. One 
> class should not do "three main responsibilities". Never. Over time the 
> industry requirements will shift towards running millions and tens of 
> millions of tasks daily, and new solutions will be required to support these 
> requirements. The way things go today, it will be very hard to introduce 
> global changes. The scheduler code looks "complex" because it was made so. 
> Inherently it's a very simple logic - query the tasks, loop over them and log 
> some stuff, we just have too much detail in one file and it's frustrating. 
> For the sake of the SRP I think we must split the scheduler one day, and any 
> friction blocking this refactor is another nail in the project's 
> maintainability coffin.
>
> A complete refactor will be a hard thing to do, so incremental changes are 
> much more feasible to introduce. Task selection logic is an important part 
> that should be taken out to another component. Here we both fix the 
> starvation and do a good thing for the project instead of burying it even 
> deeper.
>
> ---
>
> Now that we're done with the clean code topic, let's talk about the 
> maintenance overhead so feared by maintainers.
> I claim that plugin architecture does not inherently mean more effort to 
> support any kind of community implementations.
>
> > There is absolutely no way we can make it available for users to override 
> > and use their own implementation - because we will have to support whatever 
> > someone implemented.
>
>
> No. This is absolutely wrong. We won't accept any kind of implementation that 
> solves some specific edge case - not at all. The main branch will include 
> just one (at most two) generally accepted and tested implementations. If 
> someone feels like writing their own version - let them do that in their fork 
> for their business needs.
> It should never be in main until it's useful for the entire community. If 
> someone needs their specific behavior - let them do it, we won't support it 
> as it's in their fork. Plugin architecture means the ability to quickly 
> change a subcomponent to another one, not the necessity to support all kinds 
> of plugins. We just define a single API and stick to it. We've been 
> researching the starvation problem for half a year now and tried all kinds of 
> fixes. Until the component is pluggable, it was a real pain to check 
> something new.
>
> Let's connect it to our case:
> We have the #54284 PR which is designed to solve a particular issue 
> @dstandish described. If this logic solves the problem for them, I have no 
> objection to their adoption of this strategy as a custom plugin. I don't see 
> how it can be merged into main, because they did a very particular fix that 
> won't work for everybody - it will be a burden for the devs, but may be a 
> salvation for their team. My position here is making it easy for them to 
> switch to this strategy using plugin architecture, without ever taking 
> responsibility for their code. My team experienced a similar issue but for 
> pools instead of DAGs. We've been considering creating a patch like #54284, 
> but we dug deeper and found the root of the problem, so this patch was never 
> created. I agree, we shouldn't pollute the repo with small patches - it will 
> be hell.
>
> We also have the #55537 PR which is designed to solve the issue for everyone. 
> As this implementation claims to replace the current, optimistic scheduler 
> (claiming to be "just better"), I think it can certainly coexist with the 
> optimistic for a release or so. The steps are:
> 1. Testing and benchmarks outside the main tree (by enthusiasts)
> 2. Merging and wide testing by the community, with the ability to switch back 
> on failure
> 3. Deprecation of the optimistic strategy in case the new strategy is really 
> "just better"
>
> To be honest, I don't care at all if the testing is done out of main (it's 
> reasonable), but IMO the second step is still desirable because we cannot 
> expect everyone to test their workflows with the new strategy in the fork. It 
> implies switching repos, redeploying the chart and doing many unnecessary 
> steps. A configuration is much simpler (remember, the new strategy is in main 
> only after preliminary testing shows good results). It's just another safety 
> step to decrease the chance of breaking people's production workflows, as a 
> core component is changed. Regarding subclassing `SchedulerJobRunner` - it's 
> a very bad practice. There's absolutely no reason to subclass the entire job 
> class to swap one single component. It's just cumbersome and requires 
> splitting this poor "god class" to even smaller methods nobody understands. 
> If we decide to NOT test the new strategy in main but just replace the 
> current one (I say it's less safe, but possible), then it shouldn't bother us 
> at all ATP - whether it's a subclass or a configuration - as it will be taken 
> down anyway.
> We have to focus on finding a good strategy to become the main one, benchmark 
> it and understand the implications of switching to it - I hope #55537 may be 
> a good candidate.
>
> ---
>
> Regarding research papers - I don't think it's so hard to find a strategy 
> that just works for all cases. From an academic viewpoint, we have a very 
> simple case of non-preemptible single-trigger scheduling with priorities that 
> can be solved with one sort and a linear scan. This is basically an entry 
> level leetcode problem. The main difficulty was to find something that works 
> in our case considering:
> 1. The code is in Python
> 2. The tasks are in SQL
> and giving the best performance with fewer network hops.
> I can say we had a great progress, and I'll give a broader description of the 
> new approach we're trying now in a corresponding mail topic later.
>
> ---
>
> TL;DR:
> A separation of concerns is highly desired for the scheduler and we should 
> make it BETTER, not WORSE.
> Pluggability is a good thing so everyone can inject things of their own.
> We won't support all kinds of community scheduling strategies in the main 
> tree, to clarify - we won't support any, except the one working well in all 
> cases.
> If we test outside of the main repo, we shouldn't care how the strategy is 
> selected, but inheritance is a messy approach and a pretty bad pattern here.
> Let's focus on solving starvation, and just do the coding right, adhering to 
> SRP and minimizing the maintenance burden.
>
>
> On Monday, September 15th, 2025 at 10:23 PM, Natanel natanelrud...@gmail.com 
> wrote:
>
> > Hello.
> >
> > Me and Asquator have already been through this issue, and we have, what we
> > think, is a decent implementation of pluggable task selection algorithm for
> > airflow.
> > (which we have implemented here
> > https://github.com/Asquator/airflow/tree/feature/pessimistic-task-fetching-with-window-function
> > )
> >
> > I agree that no perfect solution will ever exist in airflow for all use
> > cases, regarding task selection, which is why this is probably a necessity
> > more than a Nice To Have feature.
> >
> > In the current way we implemented it, we can have a few pre implemented
> > algorithms, that solve different issues, as not all users will encounter
> > all issues, and by making them pluggable correctly, with a configuration,
> > we can include the documentation on when to use a specific task selection
> > algorithm, just like Jarek Potiuk proposed. it will not be customizable,
> > but rather injectable inside of the airflow-core package.
> >
> > Of course there are risks that come along with it, like users abusing it
> > and trying to create a new task selection algorithm for each edge case or
> > use case they have, which can become hard to maintain and follow, however,
> > I do not agree that it makes it harder to maintain (in terms of code
> > amount), or easier to make mistakes, though, if implemented correctly, each
> > task selector is independent, and acts as a black box, has a simple api,
> > and can be interchanged without any code changes, which makes it, in my
> > opinion, easier to maintain existing algorithms, and removes the need to
> > change a single big and sloppy file (as it is now).
> > In fact, I am certain that making it pluggable will simplify the scheduler
> > altogether as now, different parts will be clearly separated in different
> > files and directories.
> >
> > Allowing the injectable algorithms, does give more flexibility, and can
> > even make adding the new priority weights algorithm quite simple, and not
> > cause any massive changes.
> >
> > The main downside is that we have to choose an api very carefully, as when
> > we add it, it will be exceptionally hard to change it, as it would mean
> > changing it in multiple places, and so it would be considered a breaking
> > change.
> >
> > On Mon, 1 Sept 2025 at 18:36, Christos Bisias christos...@gmail.com wrote:
> >
> > > Hello,
> > >
> > > A while back I started a discussion on the mailing list regarding making
> > > some changes to the task selection query in order to improve the
> > > scheduler's throughput.
> > >
> > > https://github.com/apache/airflow/pull/54103
> > >
> > > Another topic came up during that discussion related to task starvation 
> > > due
> > > to the current selection algorithm. There are two open PRs with different
> > > fixes for that issue.
> > >
> > > https://github.com/apache/airflow/pull/54284
> > >
> > > https://github.com/apache/airflow/pull/53492
> > >
> > > Everyone has his own needs and it's probable that a good number of users
> > > won't experience the starvation issue.
> > >
> > > Each approach has its own advantages and disadvantages and for that reason
> > > it doesn't feel like there is a right or wrong approach here or a single
> > > solution for all.
> > >
> > > There have been papers on task selection algorithms like this one
> > >
> > > https://ieeexplore.ieee.org/document/9799199
> > >
> > > I would like to suggest refactoring the scheduler so that the task
> > > selection algorithm can be pluggable. The current implementation will be
> > > the default. Everyone will be able to configure the path to his own class.
> > > That will be the most beneficial to the majority of users.
> > >
> > > In the future, anyone could create a PR with his implementation and if
> > > enough people like it, it could be added to the repo.
> > >
> > > This has already been done for the priority weights algorithm, so why not
> > > in this case as well?
> > >
> > > https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/priority-weight.html#custom-weight-rule
> > >
> > > If there is positive feedback on this idea, I would like to implement it.
> > >
> > > Please let me know what you think. Thank you!
> > >
> > > Regards,
> > > Christos

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Re: [Discussion] Make the scheduler's task selection algorithm pluggable

Reply via email to