To briefly summarise my points in slack on this:

I don’t want individual pluggable components of the scheduler, as the support 
burden of all the possible combinations of them is frankly scary.

If we do want to make the Scheduler behaviour pluggable in any way, it should 
be one thing: The JobRunner class that `airflow scheduler` runs. If that is 
pluggable, then The scheduler is a class with an `execute()` method whose job 
is to stay running, and feed tasks to executors when they should be executed. 
If you just want to change a bit you can maybe subclass it, but if you want to 
do a lot you can write a total custom class. PRs to refactor the existing 
SchedulerJob to better encapsulation of responsibility might be accepted. 
Roughly speaking the scheduler has three main responsibilities 
https://github.com/apache/airflow/blob/bf36a6c292b75d1f8d06b86d83b0138b46e1aa35/airflow-core/src/airflow/jobs/scheduler_job_runner.py#L1398-L1426
 - Most of the changes being discussed I think would be inside the 
“_critical_section_enqueue_task_instances” fn 
https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/scheduler_job_runner.py#L672-L690
 (critical section meaning at most one scheduler can be in that block at a 
time.)

Any alternative Scheduler algorithms should first be developed out-of-tree 
(i.e. not in apache/airflow repo) to gain stability and be production tested 
and hardened before we support them. For better or for worse, Airflow “is” the 
scheduler, bugs and all.

> On 14 Sep 2025, at 22:53, Daniel Standish via dev <dev@airflow.apache.org> 
> wrote:
> 
> I have a feeling that ultimately, in order to bring performance
> enhancement to scheduler, we may need to make some changes to behavior.
> Like simplify / reduce some of the work it has to do.
> 
> Ie reduce some of the power that users have to do weird conditional
> scheduling logic, and configs and priority promises. But this is mostly
> intuitive feeling at this point.
> 
> 
> 
> 
> 
> On Sun, Sep 14, 2025 at 3:15 AM Christos Bisias <christos...@gmail.com>
> wrote:
> 
>> I get your points and also I had an offline discussion on slack with Ash
>> who had a similar opinion. He pointed out that each new algo is a new
>> scheduler, leading to an unwanted maintenance burden.
>> 
>> I’m not going to pursue this any further. Thank you for your replies!
>> 
>> Christos
>> 
>> Christos
>> 
>> On Sat, Sep 13, 2025 at 10:37 Jens Scheffler <jsche...@apache.org> wrote:
>> 
>>> I see a bit of a risk, as the scheduler code is quite complex...
>>> (similar like Jarek) if somebody sees this and plugs in, I assume in
>>> most cases this make it worse. Also locks us in a plugin API and removes
>>> flexibility if we need to change/refactor something.
>>> 
>>> On the other side I fear also a bit that the Scheduler is very complex
>>> and adding multiple parallel strategies adds redundant code path's which
>>> make it hard to maintain as load tests etc. must validate both not to
>>> degrade and features need to be added to both.
>>> 
>>> So I'd favor to keep it to a (maybe configurable) single logic.
>>> 
>>> Unfortunately I had no mental capacity in drilling into the discussion
>>> and details so far, the beast of SQL code shared was frightening me a
>> bit.
>>> 
>>> Jens
>>> 
>>> On 13.09.25 07:06, Jarek Potiuk wrote:
>>>> I think, even if we do it - this should only be something internal. I
>>> don't
>>>> see why  we should make it customizable. If we want to choose between
>>>> different algorithms we should explicitly tell users why they should
>>> choose
>>>> different algorithms and make sure we have data  backing it up. There
>> is
>>>> absolutely no way we can make it available for users to override and
>> use
>>>> their own implementation - because we will have to support whatever
>>> someone
>>>> implemented.
>>>> 
>>>> On Thu, Sep 4, 2025 at 3:08 PM Christos Bisias <christos...@gmail.com>
>>>> wrote:
>>>> 
>>>>> I’d appreciate any feedback on this.
>>>>> 
>>>>> On Mon, Sep 1, 2025 at 18:35 Christos Bisias <christos...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> A while back I started a discussion on the mailing list regarding
>>> making
>>>>>> some changes to the task selection query in order to improve the
>>>>>> scheduler's throughput.
>>>>>> 
>>>>>> https://github.com/apache/airflow/pull/54103
>>>>>> 
>>>>>> Another topic came up during that discussion related to task
>> starvation
>>>>>> due to the current selection algorithm. There are two open PRs with
>>>>>> different fixes for that issue.
>>>>>> 
>>>>>> https://github.com/apache/airflow/pull/54284
>>>>>> 
>>>>>> https://github.com/apache/airflow/pull/53492
>>>>>> 
>>>>>> Everyone has his own needs and it's probable that a good number of
>>> users
>>>>>> won't experience the starvation issue.
>>>>>> 
>>>>>> Each approach has its own advantages and disadvantages and for that
>>>>> reason
>>>>>> it doesn't feel like there is a right or wrong approach here or a
>>> single
>>>>>> solution for all.
>>>>>> 
>>>>>> There have been papers on task selection algorithms like this one
>>>>>> 
>>>>>> https://ieeexplore.ieee.org/document/9799199
>>>>>> 
>>>>>> I would like to suggest refactoring the scheduler so that the task
>>>>>> selection algorithm can be pluggable. The current implementation will
>>> be
>>>>>> the default. Everyone will be able to configure the path to his own
>>>>> class.
>>>>>> That will be the most beneficial to the majority of users.
>>>>>> 
>>>>>> In the future, anyone could create a PR with his implementation and
>> if
>>>>>> enough people like it, it could be added to the repo.
>>>>>> 
>>>>>> This has already been done for the priority weights algorithm, so why
>>> not
>>>>>> in this case as well?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/priority-weight.html#custom-weight-rule
>>>>>> If there is positive feedback on this idea, I would like to implement
>>> it.
>>>>>> 
>>>>>> Please let me know what you think. Thank you!
>>>>>> 
>>>>>> Regards,
>>>>>> Christos
>>>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>> 
>>> 
>> 

Reply via email to