Re: [Discussion] Make the scheduler's task selection algorithm pluggable

asquator Wed, 17 Sep 2025 11:27:03 -0700

Well, it's always new contributors who criticize the old code, because they're 
affected the most. 
It's not about blaming people, but noticing things "at the margin", ignoring 
the history and proposing
changes that are objectively beneficial *at the moment*. I did watch Ash's talk 
before starting to work 
on the starvation issue, and it's great. I have nothing to say about the 
algorithm and the logic
(mb except it should be asynchronous), but the code structure. Over the years, 
small patches have been added 
to one file, and now it has exploded. It's normal. It happens. We just have to 
realize that and start 
splitting the code, without drastically changing the logic.


Practice shows that modular code is always better than monolithic scripts with 
multiple responsibilities.
I don't say abstract every small part of the code. The "cheapest" way to decide 
which parts should
be abstracted is the simple principle of "if the component has to be replaced 
once, it can be replaced again in the future".
As we're working on the critical section, we're ready to do the extra work to 
abstract it out and make it
pluggable (for developers) - it can be the first step in modularizing the 
scheduler which we're ready to take.

As I have to implement the PR, I'm willing to dive into the tactical level and 
reach a compromise with the PMCs
(who know this project's internals much better) regarding what's the best 
option to inject things internally.
It's a purely technical problem I have to solve to have a working PR. I *will* 
do the subclass for testing because
it's quick and dirty, but it would be nice to come to a better solution (it 
shouldn't be too hard) so we can
do it the best way before it's (hopefully) merged.



On Tuesday, September 16th, 2025 at 7:08 PM, Jarek Potiuk <[email protected]> 
wrote:

> I think this is something we should seriously discuss after 3.1 as many
> people are busy now with tying up loose ends.
> 
> But I would be generally in favour of looking at Scheduler logic,
> modularising it and making it easier to reason about. I do not want to
> (absolutely) say that it's bad or "should have been done better" or
> anything to say always modular code is best (which often might be implied
> as "someone did a bad job here"). This is absolutely not that. It's very
> easy to criticise things (even subconsciously) when you come out from
> outside and you see how things "should have been" - but without all the
> context and history, this is often a bad messaging to those who spent years
> on making thing work reliably in production for tens of thousands of users
> and being generally stable and one of the most reliable part of Airflow
> "core".
> 
> There are PLENTY of reasons for scheduler being implemented the way it is -
> and even trying to approach explaining the history and decision making
> process would take a lot of time. And currently (for good reasons) the
> scheduler "API" is exactly what Ash explained.
> 
> But there is no reason we should not think and make a concerted (and group)
> effort to modularise it and make scheduler easier to reason about and -
> importantly - being easier for more people to contribute to and discuss it
> - and most importantly - adding way more modularised tests that would also
> allow us to break it up and tests parts of it - also for performance and
> behaviour characteristics. It's not well suited for it today, but -
> possibly - it could. And as the most important part of Airflow, we
> should make it easier to understand, reason about and contribute to by many
> people. Simply now there are probably just a few people (Ash being the main
> person) that can reason and discuss some intrinsic scheduler insights. And
> If we want to make Airflow sustainable, we should make it easier for others
> to understand and contribute to it.
> 
> One of the things as a result of it - I would love to have it better
> documented and explained the reasoning behind some decisions and explaining
> how it works (that might be a result of such a concerted effort).
> 
> It's a similar story as with Ci/CD Breeze two years ago - I was the only
> one who really could reason about it but through rewriting in Python
> and documenting with ADRs
> https://github.com/apache/airflow/tree/main/dev/breeze/doc/adr which still
> describe some basic assumptions there and engaging others, modularising
> stuff and getting them to participate I can go now for 3 weeks vacations
> knowing that things will be taken care of, no matter what (Which BTW. I am
> doing now).
> 
> The "why" and "how" scheduler works is not really documented. There is this
> fantastic talk by Ash https://www.youtube.com/watch?v=DYC4-xElccE which
> still holds and explains it, but I would love to be able to reason and
> discuss more about it - looking at both code and docs - without
> reverse-engineering stuff.
> 
> But I think the goal should be "modularising first" - maybe resulting
> later in easier way of replacing pieces of scheduler, so the modularising
> effort should be guided by the current PRs and ways they are trying to
> address starvation for example. Doing it slowly, with mutliple people
> reviewing, learning and contributing (and documentation created on the way).
> 
> I think that should be our initial goal ... then maybe things will
> follow.
> 
> J.
> 
> 
> 
> On Tue, Sep 16, 2025 at 8:02 AM asquator [email protected] wrote:
> 
> > will be undocumented*
> > 
> > On Tuesday, September 16th, 2025 at 5:01 PM, asquator [email protected]
> > wrote:
> > 
> > > I see the motivation, but does it have to look so bad?
> > > 
> > > The subclass will look like this:
> > > 
> > > class SchedulerJobRunnerLinearTIScan(SchedulerJobRunner):
> > > def init(
> > > self,
> > > job: Job,
> > > num_runs: int = conf.getint("scheduler", "num_runs"),
> > > scheduler_idle_sleep_time: float = conf.getfloat("scheduler",
> > > "scheduler_idle_sleep_time"),
> > > log: Logger | None = None,
> > > ):
> > > super().init(
> > > job=job,
> > > num_runs=num_runs,
> > > scheduler_idle_sleep_time=scheduler_idle_sleep_time,
> > > log=log,
> > > )
> > > self.task_selector = TASK_SELECTORS[LINEAR_SCAN_SELECTOR]
> > > 
> > > The super class will use the injected hard-coded task selector.
> > > 
> > > Can't we introduce a configuration hierarchy like `core.internal` and
> > > put there things not exposed to the end user? So we don't have to do this
> > > weird subclassing?
> > > 
> > > It will look thus:
> > > 
> > > class SchedulerJobRunner(...):
> > > task_selector_type = conf.get("scheduler.internal",
> > > "task_selector_strategy")
> > > self.task_selector = TASK_SELECTORS[task_selector_type]
> > > 
> > > We'd just like to have an internal toggle as an implementation detail,
> > > which won't be undocumented and custom implementations won't be supported.
> > > It's just more convenient and straightforward.
> > > 
> > > Mb there's another way of internal settings management I missed?
> > > 
> > > On Tuesday, September 16th, 2025 at 11:34 AM, Ash Berlin-Taylor
> > > [email protected] wrote:
> > > 
> > > > > On 16 Sep 2025, at 08:58, asquator [email protected] wrote:
> > > > > 
> > > > > Yes, exposing pluggable features means fixing an API, which is
> > > > > confining and just hard to do given the current implementation
> > > > 
> > > > class MyScheduler:
> > > > def execute(self):
> > > > while True:
> > > > # Do what ever you want.
> > > > 
> > > > `airflow scheduler --impl=my.module.MyScheduler`
> > > > 
> > > > That is the API.
> > > > 
> > > > That is as pluggable as we need it to be.
> > > > 
> > > > Everything can be built on top of that, including if you want it, a
> > > > pluggable task selection mechanisms.
> > > > 
> > > > Airflow already has too many config options and ways of tuning
> > > > behaviour. We need less of them, not more.
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [Discussion] Make the scheduler's task selection algorithm pluggable

Reply via email to