Some parts got swallowed by the markdown blockquotes. Those reading on the Apache website, please unwrap them.
On Tuesday, September 16th, 2025 at 1:08 AM, asquator <asqua...@proton.me> wrote: > Hello! > > First some updates regarding the #54392 PR: > Contributions to the PR have been halted. See the PR itself for more > information. > A new PR was opened to address the general problem of starvation, utilizing > stored SQL functions/procedures and any reviews are welcome: > https://github.com/apache/airflow/pull/55537 > > My position on pluggable scheduler is that every piece of software, > especially complex software must be split into smaller, independent > components which are made pluggable whether internally (bootstrap files) or > configurations. It has been said above that the scheduler's code is > exceptionally "complex", and I completely disagree with that. It's not > complex but cumbersome, dirty, overloaded and highly monolithic. We have a > function called _executable_task_instances_to_queued having 355 (!) lines and > 4 (!) levels of nesting. This opposes ANY normal clean code standards which > is kind of... BAD. This is what makes the scheduler "complex", difficult to > change, and difficult for newcomers to step into. This was just one example, > but the entire class is written like that. Sometimes I have a feeling it has > been intentionally sabotaged to look this way, and it's sad. > > > Roughly speaking the scheduler has three main responsibilities > > > Exactly! This is a big problem for the SRP. The scheduler should be a facade > that just triggers different steps, instead of one large incomprehensible > `while True: do_everything()` script as it looks now. IMO the independent > steps should even run asynchronously instead of current sequential execution. > It will both make the code cleaner and produce more efficient results. One > class should not do "three main responsibilities". Never. Over time the > industry requirements will shift towards running millions and tens of > millions of tasks daily, and new solutions will be required to support these > requirements. The way things go today, it will be very hard to introduce > global changes. The scheduler code looks "complex" because it was made so. > Inherently it's a very simple logic - query the tasks, loop over them and log > some stuff, we just have too much detail in one file and it's frustrating. > For the sake of the SRP I think we must split the scheduler one day, and any > friction blocking this refactor is another nail in the project's > maintainability coffin. > > A complete refactor will be a hard thing to do, so incremental changes are > much more feasible to introduce. Task selection logic is an important part > that should be taken out to another component. Here we both fix the > starvation and do a good thing for the project instead of burying it even > deeper. > > --- > > Now that we're done with the clean code topic, let's talk about the > maintenance overhead so feared by maintainers. > I claim that plugin architecture does not inherently mean more effort to > support any kind of community implementations. > > > There is absolutely no way we can make it available for users to override > > and use their own implementation - because we will have to support whatever > > someone implemented. > > > No. This is absolutely wrong. We won't accept any kind of implementation that > solves some specific edge case - not at all. The main branch will include > just one (at most two) generally accepted and tested implementations. If > someone feels like writing their own version - let them do that in their fork > for their business needs. > It should never be in main until it's useful for the entire community. If > someone needs their specific behavior - let them do it, we won't support it > as it's in their fork. Plugin architecture means the ability to quickly > change a subcomponent to another one, not the necessity to support all kinds > of plugins. We just define a single API and stick to it. We've been > researching the starvation problem for half a year now and tried all kinds of > fixes. Until the component is pluggable, it was a real pain to check > something new. > > Let's connect it to our case: > We have the #54284 PR which is designed to solve a particular issue > @dstandish described. If this logic solves the problem for them, I have no > objection to their adoption of this strategy as a custom plugin. I don't see > how it can be merged into main, because they did a very particular fix that > won't work for everybody - it will be a burden for the devs, but may be a > salvation for their team. My position here is making it easy for them to > switch to this strategy using plugin architecture, without ever taking > responsibility for their code. My team experienced a similar issue but for > pools instead of DAGs. We've been considering creating a patch like #54284, > but we dug deeper and found the root of the problem, so this patch was never > created. I agree, we shouldn't pollute the repo with small patches - it will > be hell. > > We also have the #55537 PR which is designed to solve the issue for everyone. > As this implementation claims to replace the current, optimistic scheduler > (claiming to be "just better"), I think it can certainly coexist with the > optimistic for a release or so. The steps are: > 1. Testing and benchmarks outside the main tree (by enthusiasts) > 2. Merging and wide testing by the community, with the ability to switch back > on failure > 3. Deprecation of the optimistic strategy in case the new strategy is really > "just better" > > To be honest, I don't care at all if the testing is done out of main (it's > reasonable), but IMO the second step is still desirable because we cannot > expect everyone to test their workflows with the new strategy in the fork. It > implies switching repos, redeploying the chart and doing many unnecessary > steps. A configuration is much simpler (remember, the new strategy is in main > only after preliminary testing shows good results). It's just another safety > step to decrease the chance of breaking people's production workflows, as a > core component is changed. Regarding subclassing `SchedulerJobRunner` - it's > a very bad practice. There's absolutely no reason to subclass the entire job > class to swap one single component. It's just cumbersome and requires > splitting this poor "god class" to even smaller methods nobody understands. > If we decide to NOT test the new strategy in main but just replace the > current one (I say it's less safe, but possible), then it shouldn't bother us > at all ATP - whether it's a subclass or a configuration - as it will be taken > down anyway. > We have to focus on finding a good strategy to become the main one, benchmark > it and understand the implications of switching to it - I hope #55537 may be > a good candidate. > > --- > > Regarding research papers - I don't think it's so hard to find a strategy > that just works for all cases. From an academic viewpoint, we have a very > simple case of non-preemptible single-trigger scheduling with priorities that > can be solved with one sort and a linear scan. This is basically an entry > level leetcode problem. The main difficulty was to find something that works > in our case considering: > 1. The code is in Python > 2. The tasks are in SQL > and giving the best performance with fewer network hops. > I can say we had a great progress, and I'll give a broader description of the > new approach we're trying now in a corresponding mail topic later. > > --- > > TL;DR: > A separation of concerns is highly desired for the scheduler and we should > make it BETTER, not WORSE. > Pluggability is a good thing so everyone can inject things of their own. > We won't support all kinds of community scheduling strategies in the main > tree, to clarify - we won't support any, except the one working well in all > cases. > If we test outside of the main repo, we shouldn't care how the strategy is > selected, but inheritance is a messy approach and a pretty bad pattern here. > Let's focus on solving starvation, and just do the coding right, adhering to > SRP and minimizing the maintenance burden. > > > On Monday, September 15th, 2025 at 10:23 PM, Natanel natanelrud...@gmail.com > wrote: > > > Hello. > > > > Me and Asquator have already been through this issue, and we have, what we > > think, is a decent implementation of pluggable task selection algorithm for > > airflow. > > (which we have implemented here > > https://github.com/Asquator/airflow/tree/feature/pessimistic-task-fetching-with-window-function > > ) > > > > I agree that no perfect solution will ever exist in airflow for all use > > cases, regarding task selection, which is why this is probably a necessity > > more than a Nice To Have feature. > > > > In the current way we implemented it, we can have a few pre implemented > > algorithms, that solve different issues, as not all users will encounter > > all issues, and by making them pluggable correctly, with a configuration, > > we can include the documentation on when to use a specific task selection > > algorithm, just like Jarek Potiuk proposed. it will not be customizable, > > but rather injectable inside of the airflow-core package. > > > > Of course there are risks that come along with it, like users abusing it > > and trying to create a new task selection algorithm for each edge case or > > use case they have, which can become hard to maintain and follow, however, > > I do not agree that it makes it harder to maintain (in terms of code > > amount), or easier to make mistakes, though, if implemented correctly, each > > task selector is independent, and acts as a black box, has a simple api, > > and can be interchanged without any code changes, which makes it, in my > > opinion, easier to maintain existing algorithms, and removes the need to > > change a single big and sloppy file (as it is now). > > In fact, I am certain that making it pluggable will simplify the scheduler > > altogether as now, different parts will be clearly separated in different > > files and directories. > > > > Allowing the injectable algorithms, does give more flexibility, and can > > even make adding the new priority weights algorithm quite simple, and not > > cause any massive changes. > > > > The main downside is that we have to choose an api very carefully, as when > > we add it, it will be exceptionally hard to change it, as it would mean > > changing it in multiple places, and so it would be considered a breaking > > change. > > > > On Mon, 1 Sept 2025 at 18:36, Christos Bisias christos...@gmail.com wrote: > > > > > Hello, > > > > > > A while back I started a discussion on the mailing list regarding making > > > some changes to the task selection query in order to improve the > > > scheduler's throughput. > > > > > > https://github.com/apache/airflow/pull/54103 > > > > > > Another topic came up during that discussion related to task starvation > > > due > > > to the current selection algorithm. There are two open PRs with different > > > fixes for that issue. > > > > > > https://github.com/apache/airflow/pull/54284 > > > > > > https://github.com/apache/airflow/pull/53492 > > > > > > Everyone has his own needs and it's probable that a good number of users > > > won't experience the starvation issue. > > > > > > Each approach has its own advantages and disadvantages and for that reason > > > it doesn't feel like there is a right or wrong approach here or a single > > > solution for all. > > > > > > There have been papers on task selection algorithms like this one > > > > > > https://ieeexplore.ieee.org/document/9799199 > > > > > > I would like to suggest refactoring the scheduler so that the task > > > selection algorithm can be pluggable. The current implementation will be > > > the default. Everyone will be able to configure the path to his own class. > > > That will be the most beneficial to the majority of users. > > > > > > In the future, anyone could create a PR with his implementation and if > > > enough people like it, it could be added to the repo. > > > > > > This has already been done for the priority weights algorithm, so why not > > > in this case as well? > > > > > > https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/priority-weight.html#custom-weight-rule > > > > > > If there is positive feedback on this idea, I would like to implement it. > > > > > > Please let me know what you think. Thank you! > > > > > > Regards, > > > Christos --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org