Hello,

A while back I started a discussion on the mailing list regarding making
some changes to the task selection query in order to improve the
scheduler's throughput.

https://github.com/apache/airflow/pull/54103

Another topic came up during that discussion related to task starvation due
to the current selection algorithm. There are two open PRs with different
fixes for that issue.

https://github.com/apache/airflow/pull/54284

https://github.com/apache/airflow/pull/53492

Everyone has his own needs and it's probable that a good number of users
won't experience the starvation issue.

Each approach has its own advantages and disadvantages and for that reason
it doesn't feel like there is a right or wrong approach here or a single
solution for all.

There have been papers on task selection algorithms like this one

https://ieeexplore.ieee.org/document/9799199

I would like to suggest refactoring the scheduler so that the task
selection algorithm can be pluggable. The current implementation will be
the default. Everyone will be able to configure the path to his own class.
That will be the most beneficial to the majority of users.

In the future, anyone could create a PR with his implementation and if
enough people like it, it could be added to the repo.

This has already been done for the priority weights algorithm, so why not
in this case as well?

https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/priority-weight.html#custom-weight-rule

If there is positive feedback on this idea, I would like to implement it.

Please let me know what you think. Thank you!

Regards,
Christos

Reply via email to