I see a bit of a risk, as the scheduler code is quite complex... (similar like Jarek) if somebody sees this and plugs in, I assume in most cases this make it worse. Also locks us in a plugin API and removes flexibility if we need to change/refactor something.

On the other side I fear also a bit that the Scheduler is very complex and adding multiple parallel strategies adds redundant code path's which make it hard to maintain as load tests etc. must validate both not to degrade and features need to be added to both.

So I'd favor to keep it to a (maybe configurable) single logic.

Unfortunately I had no mental capacity in drilling into the discussion and details so far, the beast of SQL code shared was frightening me a bit.

Jens

On 13.09.25 07:06, Jarek Potiuk wrote:
I think, even if we do it - this should only be something internal. I don't
see why  we should make it customizable. If we want to choose between
different algorithms we should explicitly tell users why they should choose
different algorithms and make sure we have data  backing it up. There is
absolutely no way we can make it available for users to override and use
their own implementation - because we will have to support whatever someone
implemented.

On Thu, Sep 4, 2025 at 3:08 PM Christos Bisias <christos...@gmail.com>
wrote:

I’d appreciate any feedback on this.

On Mon, Sep 1, 2025 at 18:35 Christos Bisias <christos...@gmail.com>
wrote:

Hello,

A while back I started a discussion on the mailing list regarding making
some changes to the task selection query in order to improve the
scheduler's throughput.

https://github.com/apache/airflow/pull/54103

Another topic came up during that discussion related to task starvation
due to the current selection algorithm. There are two open PRs with
different fixes for that issue.

https://github.com/apache/airflow/pull/54284

https://github.com/apache/airflow/pull/53492

Everyone has his own needs and it's probable that a good number of users
won't experience the starvation issue.

Each approach has its own advantages and disadvantages and for that
reason
it doesn't feel like there is a right or wrong approach here or a single
solution for all.

There have been papers on task selection algorithms like this one

https://ieeexplore.ieee.org/document/9799199

I would like to suggest refactoring the scheduler so that the task
selection algorithm can be pluggable. The current implementation will be
the default. Everyone will be able to configure the path to his own
class.
That will be the most beneficial to the majority of users.

In the future, anyone could create a PR with his implementation and if
enough people like it, it could be added to the repo.

This has already been done for the priority weights algorithm, so why not
in this case as well?



https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/priority-weight.html#custom-weight-rule
If there is positive feedback on this idea, I would like to implement it.

Please let me know what you think. Thank you!

Regards,
Christos


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to