I think this is something we should seriously discuss after 3.1 as many people are busy now with tying up loose ends.
But I would be generally in favour of looking at Scheduler logic, modularising it and making it easier to reason about. I do not want to (absolutely) say that it's bad or "should have been done better" or anything to say always modular code is best (which often might be implied as "someone did a bad job here"). This is absolutely not that. It's very easy to criticise things (even subconsciously) when you come out from outside and you see how things "should have been" - but without all the context and history, this is often a bad messaging to those who spent years on making thing work reliably in production for tens of thousands of users and being generally stable and one of the most reliable part of Airflow "core". There are PLENTY of reasons for scheduler being implemented the way it is - and even trying to approach explaining the history and decision making process would take a lot of time. And currently (for good reasons) the scheduler "API" is exactly what Ash explained. But there is no reason we should not think and make a concerted (and group) effort to modularise it and make scheduler easier to reason about and - importantly - being easier for more people to contribute to and discuss it - and most importantly - adding way more modularised tests that would also allow us to break it up and tests parts of it - also for performance and behaviour characteristics. It's not well suited for it today, but - possibly - it could. And as the **most** important part of Airflow, we should make it easier to understand, reason about and contribute to by many people. Simply now there are probably just a few people (Ash being the main person) that can reason and discuss some intrinsic scheduler insights. And If we want to make Airflow sustainable, we should make it easier for others to understand and contribute to it. One of the things as a result of it - I would love to have it better documented and explained the reasoning behind some decisions and explaining how it works (that might be a result of such a concerted effort). It's a similar story as with Ci/CD Breeze two years ago - I was the only one who **really** could reason about it but through rewriting in Python and documenting with ADRs https://github.com/apache/airflow/tree/main/dev/breeze/doc/adr which still describe some basic assumptions there and engaging others, modularising stuff and getting them to participate I can go now for 3 weeks vacations knowing that things will be taken care of, no matter what (Which BTW. I am doing now). The "why" and "how" scheduler works is not really documented. There is this fantastic talk by Ash https://www.youtube.com/watch?v=DYC4-xElccE which still holds and explains it, but I would love to be able to reason and discuss more about it - looking at both code and docs - without reverse-engineering stuff. But I think the goal should be "modularising first" - *maybe* resulting later in easier way of replacing pieces of scheduler, so the modularising effort should be guided by the current PRs and ways they are trying to address starvation for example. Doing it slowly, with mutliple people reviewing, learning and contributing (and documentation created on the way). I think *that* should be our initial goal ... then **maybe** things will follow. J. On Tue, Sep 16, 2025 at 8:02 AM asquator <asqua...@proton.me> wrote: > will be undocumented* > > On Tuesday, September 16th, 2025 at 5:01 PM, asquator <asqua...@proton.me> > wrote: > > > I see the motivation, but does it have to look so bad? > > > > The subclass will look like this: > > > > class SchedulerJobRunnerLinearTIScan(SchedulerJobRunner): > > def init( > > self, > > job: Job, > > num_runs: int = conf.getint("scheduler", "num_runs"), > > scheduler_idle_sleep_time: float = conf.getfloat("scheduler", > "scheduler_idle_sleep_time"), > > log: Logger | None = None, > > ): > > super().init( > > job=job, > > num_runs=num_runs, > > scheduler_idle_sleep_time=scheduler_idle_sleep_time, > > log=log, > > ) > > self.task_selector = TASK_SELECTORS[LINEAR_SCAN_SELECTOR] > > > > The super class will use the injected hard-coded task selector. > > > > Can't we introduce a configuration hierarchy like `core.internal` and > put there things not exposed to the end user? So we don't have to do this > weird subclassing? > > > > It will look thus: > > > > class SchedulerJobRunner(...): > > task_selector_type = conf.get("scheduler.internal", > "task_selector_strategy") > > self.task_selector = TASK_SELECTORS[task_selector_type] > > > > We'd just like to have an internal toggle as an implementation detail, > which won't be undocumented and custom implementations won't be supported. > It's just more convenient and straightforward. > > > > Mb there's another way of internal settings management I missed? > > > > > > On Tuesday, September 16th, 2025 at 11:34 AM, Ash Berlin-Taylor > a...@apache.org wrote: > > > > > > On 16 Sep 2025, at 08:58, asquator asqua...@proton.me wrote: > > > > > > > > Yes, exposing pluggable features means fixing an API, which is > confining and just hard to do given the current implementation > > > > > > class MyScheduler: > > > def execute(self): > > > while True: > > > # Do what ever you want. > > > > > > `airflow scheduler --impl=my.module.MyScheduler` > > > > > > That is the API. > > > > > > That is as pluggable as we need it to be. > > > > > > Everything can be built on top of that, including if you want it, a > pluggable task selection mechanisms. > > > > > > Airflow already has too many config options and ways of tuning > behaviour. We need less of them, not more. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >