> Therefore I'd propose (1) a pragmatic fix that can be made NOW as a bugfix and that is a global config switch similar like pools
As long as the users who rely on it have a way to bring it back, yes, it's good for me. > I do not understand why the discussion here is more preceise... when the mentioned change about the max_active_tasks was not respecting this at all and was semantically just a breaking change... yeah my bad that I missed the discussion but this is really really a problem now for us :-( This only highlights that those discussions should be longer and dig deeper - even in the original lazy consensus Daniel mentioned "admittedly brief discussion". That's why we should make sure than any change in behaviour is discussed and has a way to go back. And if it is really such a big problem now and it causes real problems and there is no alternative I absolutely see a possibility of reverting that decision implemented in #42953 - would it solve your problem? Should we do it? Or maybe there is an alternative way for you to restore the limits? J. On Mon, Mar 9, 2026 at 9:45 PM Jens Scheffler <[email protected]> wrote: > Hi Jarek, et al. > > I assume so that there are many cases. Some people might want to limit > the "workers" and in these cases the deferred are not likely counting-in > and in other cases the limits are rather defined to protect backends > (like with pools). This is also like we (urgently) need this and why the > bug came-up. > > I think adding more parameters, options and renaming them all to proper > namign is a mid- to long excercise especially as it involves all users > Dag migration. So this is a change that will maybe only have a final > cleanup in Airflow 4. We are not green-field thus we anyway need > backward compatibility. Aligning on the naming (which is always the > hardest...) and making this change will take longer. > > Therefore I'd propose (1) a pragmatic fix that can be made NOW as a > bugfix and that is a global config switch similar like pools. Aim to > tell on a global level to count deferred in-or out. And following-up as > (2) a rework of the limitation parameters which - quite frankly - are a > bit fragmented in various areas. > > Would this be OK for most? > > Jens > > P.S.: I need to highlight here that after recent migration to Airflow 3 > in our production we had the last days serious problems and very bad > feedback from our users. Took us more than a day to understand and drill > into the root cause - so multiple person days wasted until we found the > root in the discussion > https://lists.apache.org/thread/nn4y1z0yrydkmw9np4f0z5lm9gh8tmfl with > the lazy consensus > https://lists.apache.org/thread/9o84d3yn934m32gtlpokpwtbbmtxj47l and the > PR https://github.com/apache/airflow/pull/42953 causing this trouble. Why? > > Because in Airflow 2 we used max_active_tasks which defaulted to 16 as a > safety net, everybody could write their Dag. If somebody wanted to scale > larger then a PR with increasing max_active_tasks > 16 caused a review > and we could see which Dag effectively took however much resources. With > the (badly documented!) semantic change in Airflow 3 now a lot of > workload runs un-restricted because relying on mapped tasks and the > alternative max_active_tis_per_dag is not defaulted to 16 like the > previous as well as is only counted per task... so if you have a Dag > with multiple mapped tasks and Pools are needed for other limits this is > a problem now. > > I do not understand why the discussion here is more preceise... when the > mentioned change about the max_active_tasks was not respecting this at > all and was semantically just a breaking change... yeah my bad that I > missed the discussion but this is really really a problem now for us :-( > > On 09.03.26 10:18, Jarek Potiuk wrote: > > +1 on what TP and Karthikeyan wrote. We need a solid proposal for naming > > and explicitly defining those terms, along with a way for users to keep > the > > old counting method (settable per Dag). And I think it would be ok to > > change default behaviour as long as we are very clear in documenting it, > > explaining that this is really a "bug fix" (in the sense that this > > behaviour was really not intentional and by changing it we express our > > intentions explicitly) and allow the users to go back easily in Dags that > > rely on it - so that they can maybe rework them in the future and remove > > it. > > > > On Mon, Mar 9, 2026 at 8:35 AM Karthikeyan <[email protected]> wrote: > > > >> +1 on having a field to restore backwards compatible on dag level if the > >> dag parameter is being changed. Most of our workloads involve submitting > >> jobs to Spark and other upstream systems and each user has a > corresponding > >> pool. With deferred being not counted as active users had issues where > many > >> submissions were done that the upstream cannot handle. So for those > users > >> the pool was updated. There are other workloads like http based defers > >> where users just poll and don't need to worry about the upstream > capacity. > >> > >> I guess deferred was initially documented such that pool slot is > released > >> and there can be more concurrent workloads. The definition of being > counted > >> as active depends on the workload and use case. It will be helpful to > have > >> the option to have this behaviour as optional and opt-in to avoid > >> confusion. > >> > >> Thanks > >> > >> Regards, > >> Karthikeyan S > >> > >> > >> > >> On Mon, Feb 23, 2026, 9:43 PM Vikram Koka via dev < > [email protected]> > >> wrote: > >> > >>> I definitely agree with the intent, but concerned about the actual > >>> implications of making this change from a user experience perspective. > >>> > >>> With respect to pools, I would like an updated perspective on how > useful > >>> and used this is today. For example, I suspect that the async Python > >>> operator change coming in the new AIP as part of 3.2 does not respect > the > >>> pools configuration either. > >>> > >>> The max active task configurations are very useful while using the > Celery > >>> executor, which is the majority today. I got a bunch of questions > around > >>> this as part of the backfill enhancements in 3.0. > >>> > >>> I hesitate making changes to these configuration options without clear > >>> understanding and articulation of the tradeoffs. > >>> > >>> Just my two cents, > >>> Vikram > >>> > >>> On Mon, Feb 23, 2026 at 2:34 AM Wei Lee <[email protected]> wrote: > >>> > >>>> I like what Jarek suggested, but we should avoid using the term > >>> "Running". > >>>> From Airflow's perspective, a Deferred task is not considered a > Running > >>>> task, even though it may be viewed differently in the user's context. > >>>> > >>>> Additionally, we are currently using the term "Executing" here > >>>> > >> > https://github.com/apache/airflow/blob/e0cd6e246c288d33f359ec2268b3d342832e9648/airflow-core/src/airflow/utils/state.py#L67 > >>>> Maybe we can count Deferred and Running tasks as "Executing"? The > thing > >>>> that kinda bugs me is that "Defered" is also an IntermediateTIState > >> here. > >>>> On 2026/02/22 20:22:45 Natanel wrote: > >>>>> Hello Jens, I agree with everything you said, for some reason the > >>>>> "Deferred" state is not counted towards an active task, where > >>> intuitively > >>>>> it should be part of the group. > >>>>> > >>>>> As I see it at least, all the configurations talk about *active* > >> tasks > >>>>> (such as max_active_tasks, max_active_tis_per_dag, > >>>>> max_active_tis_per_dagrun), which I think is a quite confusing term. > >>>>> And to solve this, a clear explanation of what is an "active" task > >>> should > >>>>> be defined. > >>>>> > >>>>> It is possible to define that an "active" task is any task which is > >>>> either > >>>>> running, queued OR deferred, but this will enforce a new > >> configuration > >>>> for > >>>>> backwards compatibility, such as "count_deffered_as_active" (yet this > >>> is > >>>>> more enforcing and global approach, which we might not want), while > >> not > >>>>> introducing too much additional complexity, as adding more parameters > >>> by > >>>>> which we schedule tasks will only make scheduling decisions harder, > >> as > >>>> more > >>>>> parameters need to be checked, which will most likely slow down each > >>>>> decision, and might slow down the scheduler. > >>>>> > >>>>> I liked Jarek's approach, however, I think that maybe instead of > >>>>> introducing a few new params, we instead rename the current > >> parameters, > >>>>> while keeping behavior as is, slowly deprecating the "active" > >>>>> configurations, as Jarek said, and for some time keep both the > >> "active" > >>>> and > >>>>> the "running" param, while having the "active" be an alias for > >>> "running" > >>>>> until the "active" is deprecated. > >>>>> > >>>>> If there is a need for a param for deferred tasks, it is possible to > >>> add > >>>>> only for deferrable tasks, in order to not impact current scheduling > >>>>> decisions done by the scheduler. > >>>>> > >>>>> I see both approaches as viable, yet I think that adding an > >> additional > >>>>> param might introduce more complexity, and maybe should be split out > >> of > >>>> the > >>>>> regular task flow, as a deferrable task is not the same as a running > >>>> task, > >>>>> I tend to lean towards the first approach, as it seems to be the > >>>> simplest, > >>>>> however, the second approach might be more beneficial long-term. > >>>>> > >>>>> Best Regards, > >>>>> Natanel > >>>>> > >>>>> On Sun, 22 Feb 2026 at 18:43, Jarek Potiuk <[email protected]> wrote: > >>>>> > >>>>>> +1. But I think that there are cases where people wanted to > >>>>>> **actually** use `max_*` to limit how many workers the DAG or DAG > >> run > >>>>>> will take. So possibly we should give them such an option—for > >>> example, > >>>>>> max_running_tis_per_dag, etc. > >>>>>> > >>>>>> There is also the question of backward compatibility. I can see the > >>>>>> possibility of side effects - if that changes "suddenly" after an > >>>>>> upgrade. For example it might mean that some Dags will suddenly > >> start > >>>>>> using far fewer workers than before and become starved. > >>>>>> > >>>>>> So - if we want to change it, I think we should deprecate "_active" > >>>>>> and possibly add two new sets of parameters with different > >> names—but > >>>>>> naming in this case is hard (more than usual). > >>>>>> > >>>>>> J. > >>>>>> > >>>>>> On Sun, Feb 22, 2026 at 5:25 PM Pavankumar Gopidesu > >>>>>> <[email protected]> wrote: > >>>>>>> Hi Jens, > >>>>>>> > >>>>>>> Thanks for starting this discussion. I agree that we should > >> update > >>>> how > >>>>>>> these tasks are counted. > >>>>>>> > >>>>>>> I previously started a PR[1] to include deferred tasks in > >>>>>> max_active_tasks, > >>>>>>> but I was sidetracked by other priorities. As you noted, this > >>> change > >>>>>> needs > >>>>>>> to encompass not only max_active_tasks but also the other > >>> parameters > >>>> you > >>>>>>> described. > >>>>>>> > >>>>>>> [1]: https://github.com/apache/airflow/pull/41560 > >>>>>>> > >>>>>>> Regards, > >>>>>>> Pavan > >>>>>>> > >>>>>>> On Sun, Feb 22, 2026 at 12:43 PM constance.astronomer.io via > >> dev < > >>>>>>> [email protected]> wrote: > >>>>>>> > >>>>>>>> Agreed. In my opinion, the only time we should not be counting > >>>> deferred > >>>>>>>> tasks are for configurations that control worker slots, like > >>>> number of > >>>>>>>> tasks that run concurrently on a celery worker, since tasks in > >> a > >>>>>> deferred > >>>>>>>> state are not running on a worker (although you can argue that > >> a > >>>>>> triggerer > >>>>>>>> is a special kind of worker but I digress). > >>>>>>>> > >>>>>>>> For the examples you’ve listed, deferred tasks should be part > >> of > >>>> the > >>>>>>>> equation since the task IS running, just not in a traditional > >>>> worker. > >>>>>>>> Thanks for bringing this up! This has been bothering me for > >>> awhile. > >>>>>>>> Constance > >>>>>>>> > >>>>>>>>> On Feb 22, 2026, at 4:18 AM, Jens Scheffler < > >>> [email protected] > >>>>>> wrote: > >>>>>>>>> Hi There! > >>>>>>>>> > >>>>>>>>> TLDR: In fix PR https://github.com/apache/airflow/pull/61769 > >>> we > >>>>>> came to > >>>>>>>> the point that it seems today in Airflow Core the "Deferred" > >>> state > >>>>>> seems to > >>>>>>>> be counted inconsistently. I would propose to consistently > >> count > >>>>>> "Deferred" > >>>>>>>> into the counts of "Running". > >>>>>>>>> Details: > >>>>>>>>> > >>>>>>>>> * In Pools for a longer time (since PR > >>>>>>>>> https://github.com/apache/airflow/pull/32709) it is > >> possible > >>>> to > >>>>>>>>> decide whether tasks in deferred state are counted into > >> pool > >>>>>>>>> allocation or not. > >>>>>>>>> * Before that Deferred were not counted into, which caused > >>> tasks > >>>>>> being > >>>>>>>>> in deferred potentially overwhelm backends which defesated > >>> the > >>>>>>>>> purpose of pools > >>>>>>>>> * Recently it was also seen that other limits we usually have > >>> on > >>>> Dags > >>>>>>>>> defined as following do not consistently include deferred > >>> into > >>>>>> limits. > >>>>>>>>> o max_active_tasks - `The number of task instances > >> allowed > >>>> to run > >>>>>>>>> concurrently` > >>>>>>>>> o max_active_tis_per_dag - `When set, a task will be able > >>> to > >>>>>> limit > >>>>>>>>> the concurrent runs across logical_dates.` > >>>>>>>>> o max_active_tis_per_dagrun - `When set, a task will be > >>> able > >>>> to > >>>>>>>>> limit the concurrent task instances per Dag run.` > >>>>>>>>> * This means at the moment defining a task as async/deferred > >>>> escapes > >>>>>>>>> the limits > >>>>>>>>> > >>>>>>>>> Code references: > >>>>>>>>> > >>>>>>>>> * Counting tasks in Scheduler on main: > >>>>>>>>> > >> > https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/scheduler_job_runner.py#L190 > >>>>>>>>> * EXECUTION_STATES used for counting: > >>>>>>>>> > >> > https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ti_deps/dependencies_states.py#L21 > >>>>>>>>> o Here "Deferred" is missing! > >>>>>>>>> > >>>>>>>>> Alternatives that I see: > >>>>>>>>> > >>>>>>>>> * Fix it in Scheduler consistently that limits are applied > >>>> counting > >>>>>>>>> Deferred always in > >>>>>>>>> * There might be a historic reason that Deferred is not > >>> counting > >>>> in - > >>>>>>>>> then a proper documentation would be needed - but I'd > >> assume > >>>> this > >>>>>>>>> un-likely > >>>>>>>>> * There are different opinions - then the behavior might need > >>> to > >>>> be > >>>>>>>>> configurable. (But personally I can not see a reason for > >>> having > >>>>>>>>> deferred escaping the limits defined) > >>>>>>>>> > >>>>>>>>> Jens > >>>>>>>> > >>>> --------------------------------------------------------------------- > >>>>>>>> To unsubscribe, e-mail: [email protected] > >>>>>>>> For additional commands, e-mail: [email protected] > >>>>>>>> > >>>>>>>> > >>>>>> > >> --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: [email protected] > >>>>>> For additional commands, e-mail: [email protected] > >>>>>> > >>>>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: [email protected] > >>>> For additional commands, e-mail: [email protected] > >>>> > >>>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
