Hello.

Me and Asquator have already been through this issue, and we have, what we
think, is a decent implementation of pluggable task selection algorithm for
airflow.
(which we have implemented here
https://github.com/Asquator/airflow/tree/feature/pessimistic-task-fetching-with-window-function
)

I agree that no perfect solution will ever exist in airflow for all use
cases, regarding task selection, which is why this is probably a necessity
more than a Nice To Have feature.

In the current way we implemented it, we can have a few pre implemented
algorithms, that solve different issues, as not all users will encounter
all issues, and by making them pluggable correctly, with a configuration,
we can include the documentation on when to use a specific task selection
algorithm, just like Jarek Potiuk proposed. it will not be customizable,
but rather injectable inside of the airflow-core package.

Of course there are risks that come along with it, like users abusing it
and trying to create a new task selection algorithm for each edge case or
use case they have, which can become hard to maintain and follow, however,
I do not agree that it makes it harder to maintain (in terms of code
amount), or easier to make mistakes, though, if implemented correctly, each
task selector is independent, and acts as a black box, has a simple api,
and can be interchanged without any code changes, which makes it, in my
opinion, easier to maintain existing algorithms, and removes the need to
change a single big and sloppy file (as it is now).
In fact, I am certain that making it pluggable will simplify the scheduler
altogether as now, different parts will be clearly separated in different
files and directories.

Allowing the injectable algorithms, does give more flexibility, and can
even make adding the new priority weights algorithm quite simple, and not
cause any massive changes.

The main downside is that we have to choose an api very carefully, as when
we add it, it will be exceptionally hard to change it, as it would mean
changing it in multiple places, and so it would be considered a breaking
change.


On Mon, 1 Sept 2025 at 18:36, Christos Bisias <christos...@gmail.com> wrote:

> Hello,
>
> A while back I started a discussion on the mailing list regarding making
> some changes to the task selection query in order to improve the
> scheduler's throughput.
>
> https://github.com/apache/airflow/pull/54103
>
> Another topic came up during that discussion related to task starvation due
> to the current selection algorithm. There are two open PRs with different
> fixes for that issue.
>
> https://github.com/apache/airflow/pull/54284
>
> https://github.com/apache/airflow/pull/53492
>
> Everyone has his own needs and it's probable that a good number of users
> won't experience the starvation issue.
>
> Each approach has its own advantages and disadvantages and for that reason
> it doesn't feel like there is a right or wrong approach here or a single
> solution for all.
>
> There have been papers on task selection algorithms like this one
>
> https://ieeexplore.ieee.org/document/9799199
>
> I would like to suggest refactoring the scheduler so that the task
> selection algorithm can be pluggable. The current implementation will be
> the default. Everyone will be able to configure the path to his own class.
> That will be the most beneficial to the majority of users.
>
> In the future, anyone could create a PR with his implementation and if
> enough people like it, it could be added to the repo.
>
> This has already been done for the priority weights algorithm, so why not
> in this case as well?
>
>
> https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/priority-weight.html#custom-weight-rule
>
> If there is positive feedback on this idea, I would like to implement it.
>
> Please let me know what you think. Thank you!
>
> Regards,
> Christos
>

Reply via email to