Re Daniel: Pushing the logic to sprocs is a brave venture indeed. The idea came to me as a last resort after a long frustration with different techniques. We do it because we've seemingly tried everything else, and _failed miserably_. Ofc there is a big maintenance question, and I'll reserve my detailed comment before I do more improvements. It IS a tougher sell, but compared to our previous attempt with WFs I see three improvements here: 1. Maintainability (adding new limits) 2. Performance 3. Solved cases I will notify on the main mailing list devoted to starvation when it's solid and ready for testing. The PR is already open for reviews.
Re Jarek: Thanks for the direction, we'll do the best to follow the path of incremental changes and hopefully arrive at a *neat* solution in the end. On Tuesday, September 16th, 2025 at 8:42 PM, Daniel Standish via dev <dev@airflow.apache.org> wrote: > I agree with Jarek's comments. > > Re some of your comments... > > Scheduler can be improved. Scheduler should be improved. Incremental is > good. > > Nobody is going to dispute that breaking things up into components that are > easier to understand and maintain is a good goal. > > Nobody is going to object to running more stuff in parallel, if it is > feasible and produces better performance, is a good thing. > > Re your specific PR > > I think that your idea to push the logic down to stored procs though is a > tougher sell. For one, you need to add a different sproc per dialect. And > the length of the one sproc you wrote is about the same as the long > function that you are trying to replace. SQL is also a worse language for > intelligibility and modularization so it goes in the opposite direction. > Your sproc is tougher to inspect than the function it replaces IMHO. > > We obviously cannot write off the idea of using sprocs and pushing more > logic to sql. But there are real tradeoffs and the case has to be made > that it's worth it. The sproc approach does appear to be worse to > maintain, tougher to test, and more error prone. And to make changes > requires a deployment --- not just patching code. So, while a case could > be made that it is worth it, the case has to be made. > > > On Tue, Sep 16, 2025 at 10:33 AM Jarek Potiuk ja...@potiuk.com wrote: > > > It's just the very basic thing here is 'community over code' - wy believe > > that good code is result of building great community that works together. > > And that the opposite is not necessarily true. > > > > In this case I identify the issue not as 'code is bad' but as `let's build > > community of people collaborating on making it better' - where careful > > refactoring with participation od multiple parties and especially building > > on all the experience and accumulated years of wisdom of people who worked > > on it is crucial. Because individuals who have even best intentions to make > > things 'better' might just miss a lot of context (like for example the > > open-source context where we have to be careful what we expose to users - > > lessons we learned hard over the many years Airflow is maintained by us. > > > > And it is simply rather closing for cooperation rather than opening if we > > start with goal that is too ambitious and starts with "let's rip it all out > > it's badly designed" vibe to it. > > > > I think it's better to see the modularity / customisability as potentially > > good end goal and start with incremental improvements that will also > > address the imperfect explanation of context and decisions and get people > > enthused and collaborating on it. > > > > J. > > > > wt., 16 wrz 2025, 11:13 użytkownik asquator asqua...@proton.me napisał: > > > > > Well, it's always new contributors who criticize the old code, because > > > they're affected the most. > > > It's not about blaming people, but noticing things "at the margin", > > > ignoring the history and proposing > > > changes that are objectively beneficial at the moment. I did watch > > > Ash's > > > talk before starting to work > > > on the starvation issue, and it's great. I have nothing to say about the > > > algorithm and the logic > > > (mb except it should be asynchronous), but the code structure. Over the > > > years, small patches have been added > > > to one file, and now it has exploded. It's normal. It happens. We just > > > have to realize that and start > > > splitting the code, without drastically changing the logic. > > > > > > Practice shows that modular code is always better than monolithic scripts > > > with multiple responsibilities. > > > I don't say abstract every small part of the code. The "cheapest" way to > > > decide which parts should > > > be abstracted is the simple principle of "if the component has to be > > > replaced once, it can be replaced again in the future". > > > As we're working on the critical section, we're ready to do the extra > > > work > > > to abstract it out and make it > > > pluggable (for developers) - it can be the first step in modularizing the > > > scheduler which we're ready to take. > > > > > > As I have to implement the PR, I'm willing to dive into the tactical > > > level > > > and reach a compromise with the PMCs > > > (who know this project's internals much better) regarding what's the best > > > option to inject things internally. > > > It's a purely technical problem I have to solve to have a working PR. I > > > will do the subclass for testing because > > > it's quick and dirty, but it would be nice to come to a better solution > > > (it shouldn't be too hard) so we can > > > do it the best way before it's (hopefully) merged. > > > > > > On Tuesday, September 16th, 2025 at 7:08 PM, Jarek Potiuk < > > > ja...@potiuk.com> wrote: > > > > > > > I think this is something we should seriously discuss after 3.1 as many > > > > people are busy now with tying up loose ends. > > > > > > > > But I would be generally in favour of looking at Scheduler logic, > > > > modularising it and making it easier to reason about. I do not want to > > > > (absolutely) say that it's bad or "should have been done better" or > > > > anything to say always modular code is best (which often might be > > > > implied > > > > as "someone did a bad job here"). This is absolutely not that. It's > > > > very > > > > easy to criticise things (even subconsciously) when you come out from > > > > outside and you see how things "should have been" - but without all the > > > > context and history, this is often a bad messaging to those who spent > > > > years > > > > on making thing work reliably in production for tens of thousands of > > > > users > > > > and being generally stable and one of the most reliable part of Airflow > > > > "core". > > > > > > > > There are PLENTY of reasons for scheduler being implemented the way it > > > > is - > > > > and even trying to approach explaining the history and decision making > > > > process would take a lot of time. And currently (for good reasons) the > > > > scheduler "API" is exactly what Ash explained. > > > > > > > > But there is no reason we should not think and make a concerted (and > > > > group) > > > > effort to modularise it and make scheduler easier to reason about and - > > > > importantly - being easier for more people to contribute to and discuss > > > > it > > > > - and most importantly - adding way more modularised tests that would > > > > also > > > > allow us to break it up and tests parts of it - also for performance > > > > and > > > > behaviour characteristics. It's not well suited for it today, but - > > > > possibly - it could. And as the most important part of Airflow, we > > > > should make it easier to understand, reason about and contribute to by > > > > many > > > > people. Simply now there are probably just a few people (Ash being the > > > > main > > > > person) that can reason and discuss some intrinsic scheduler insights. > > > > And > > > > If we want to make Airflow sustainable, we should make it easier for > > > > others > > > > to understand and contribute to it. > > > > > > > > One of the things as a result of it - I would love to have it better > > > > documented and explained the reasoning behind some decisions and > > > > explaining > > > > how it works (that might be a result of such a concerted effort). > > > > > > > > It's a similar story as with Ci/CD Breeze two years ago - I was the > > > > only > > > > one who really could reason about it but through rewriting in Python > > > > and documenting with ADRs > > > > https://github.com/apache/airflow/tree/main/dev/breeze/doc/adr which > > > > still > > > > describe some basic assumptions there and engaging others, modularising > > > > stuff and getting them to participate I can go now for 3 weeks > > > > vacations > > > > knowing that things will be taken care of, no matter what (Which BTW. I > > > > am > > > > doing now). > > > > > > > > The "why" and "how" scheduler works is not really documented. There is > > > > this > > > > fantastic talk by Ash https://www.youtube.com/watch?v=DYC4-xElccE > > > > which > > > > still holds and explains it, but I would love to be able to reason and > > > > discuss more about it - looking at both code and docs - without > > > > reverse-engineering stuff. > > > > > > > > But I think the goal should be "modularising first" - maybe resulting > > > > later in easier way of replacing pieces of scheduler, so the > > > > modularising > > > > effort should be guided by the current PRs and ways they are trying to > > > > address starvation for example. Doing it slowly, with mutliple people > > > > reviewing, learning and contributing (and documentation created on the > > > > way). > > > > > > > > I think that should be our initial goal ... then maybe things will > > > > follow. > > > > > > > > J. > > > > > > > > On Tue, Sep 16, 2025 at 8:02 AM asquator asqua...@proton.me wrote: > > > > > > > > > will be undocumented* > > > > > > > > > > On Tuesday, September 16th, 2025 at 5:01 PM, asquator > > > > > asqua...@proton.me > > > > > wrote: > > > > > > > > > > > I see the motivation, but does it have to look so bad? > > > > > > > > > > > > The subclass will look like this: > > > > > > > > > > > > class SchedulerJobRunnerLinearTIScan(SchedulerJobRunner): > > > > > > def init( > > > > > > self, > > > > > > job: Job, > > > > > > num_runs: int = conf.getint("scheduler", "num_runs"), > > > > > > scheduler_idle_sleep_time: float = conf.getfloat("scheduler", > > > > > > "scheduler_idle_sleep_time"), > > > > > > log: Logger | None = None, > > > > > > ): > > > > > > super().init( > > > > > > job=job, > > > > > > num_runs=num_runs, > > > > > > scheduler_idle_sleep_time=scheduler_idle_sleep_time, > > > > > > log=log, > > > > > > ) > > > > > > self.task_selector = TASK_SELECTORS[LINEAR_SCAN_SELECTOR] > > > > > > > > > > > > The super class will use the injected hard-coded task selector. > > > > > > > > > > > > Can't we introduce a configuration hierarchy like `core.internal` > > > > > > and > > > > > > put there things not exposed to the end user? So we don't have to > > > > > > do > > > > > > this > > > > > > weird subclassing? > > > > > > > > > > > > It will look thus: > > > > > > > > > > > > class SchedulerJobRunner(...): > > > > > > task_selector_type = conf.get("scheduler.internal", > > > > > > "task_selector_strategy") > > > > > > self.task_selector = TASK_SELECTORS[task_selector_type] > > > > > > > > > > > > We'd just like to have an internal toggle as an implementation > > > > > > detail, > > > > > > which won't be undocumented and custom implementations won't be > > > > > > supported. > > > > > > It's just more convenient and straightforward. > > > > > > > > > > > > Mb there's another way of internal settings management I missed? > > > > > > > > > > > > On Tuesday, September 16th, 2025 at 11:34 AM, Ash Berlin-Taylor > > > > > > a...@apache.org wrote: > > > > > > > > > > > > > > On 16 Sep 2025, at 08:58, asquator asqua...@proton.me wrote: > > > > > > > > > > > > > > > > Yes, exposing pluggable features means fixing an API, which is > > > > > > > > confining and just hard to do given the current implementation > > > > > > > > > > > > > > class MyScheduler: > > > > > > > def execute(self): > > > > > > > while True: > > > > > > > # Do what ever you want. > > > > > > > > > > > > > > `airflow scheduler --impl=my.module.MyScheduler` > > > > > > > > > > > > > > That is the API. > > > > > > > > > > > > > > That is as pluggable as we need it to be. > > > > > > > > > > > > > > Everything can be built on top of that, including if you want > > > > > > > it, a > > > > > > > pluggable task selection mechanisms. > > > > > > > > > > > > > > Airflow already has too many config options and ways of tuning > > > > > > > behaviour. We need less of them, not more. > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > For additional commands, e-mail: dev-h...@airflow.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org