Fair enough. Do we want to document that? Vikram, does that answer your question?
Thanks & Regards, Amogh Desai On Wed, Mar 11, 2026 at 3:00 AM Ferruzzi, Dennis <[email protected]> wrote: > Currently the celery executor's implementation pushes all callbacks to the > "default" queue unless the user defined a queue in the callback definition > [1]. We could add some logic to make it default to whichever queue spawned > it, I suppose.... that may make more sense. But the current behavior is > "default unless you set one" which feels intuitive. > > > [1] > https://github.com/apache/airflow/blob/ed237dff7c9c6ef6a25be34a243f5645ab0ccf67/providers/celery/src/airflow/providers/celery/executors/celery_executor.py#L173 > ________________________________ > From: Amogh Desai <[email protected]> > Sent: Tuesday, March 10, 2026 3:08 AM > To: [email protected] <[email protected]> > Subject: RE: [EXT] [AIP-86] Deadline Callback queuing and priority > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que > le contenu ne présente aucun risque. > > > > Apologies for the delayed response, I was OOO last week. > > Thanks for writing this up, Dennis -- really helpful to have the rationale > documented in one place. > > Reasoning behind the "two collection" approach with callbacks having higher > priority over tasks makes sense to me. > > I think shipping with FIFO and no `max_executor_callbacks` for now is fine > too. > > Now, building on Vikram's qn, one thing that might be worth clarifying is: > in a > multi-queue executor setup (celery with multiple workers/queues), which > *queue* > do the callbacks get routed to? (what happens when the executor actually > dispatches > the callback to a worker) Is it always the default queue or is there some > sort > of affinity to the other queues? Might be worth documenting it even if the > answer > is a no brainer. > > > Thanks & Regards, > Amogh Desai > > > On Wed, Mar 4, 2026 at 5:42 AM Ferruzzi, Dennis <[email protected]> > wrote: > > > We are going to go round and round on the "I think I know what you mean, > > but..." carousel 😄 > > > > I suspect you are thinking in terms of Airflow Queues like the Celery > > Queue where you can direct tasks to a specific Celery worker. I was > using > > "callback queue" to refer to an internal data structure within the > > executor, not a user-facing Queue. I could have easily called it a FIFO > > collection or list or whatever, and maybe I should have. > > > > Internally, the executor has two collections: one stores the callbacks > in > > FIFO order, and one stores the prioritized tasks. When a slot opens up, > it > > takes the top of the callback list if any are available, then fills the > > remaining slots from the list of tasks. From the user's point of view > it's > > one prioritized queue where all callbacks get top priority. > > > > I suspect there is confusion because I've somewhat overloaded the term > > Queue. I'm not sure there is any configuration to update. > > > > - ferruzzi > > ________________________________ > > From: Vikram Koka via dev <[email protected]> > > Sent: Tuesday, March 3, 2026 8:54 AM > > To: [email protected] <[email protected]> > > Cc: Vikram Koka <[email protected]> > > Subject: RE: [EXT] [AIP-86] Deadline Callback queuing and priority > > > > CAUTION: This email originated from outside of the organization. Do not > > click links or open attachments unless you can confirm the sender and > know > > the content is safe. > > > > > > > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. > > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne > pouvez > > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain > que > > le contenu ne présente aucun risque. > > > > > > > > Dennis, > > > > Thank you for writing this up. > > > > I completely missed this concept of “callback queue” as a parallel FIFO > to > > the task queue during the AIP discussions and in the AIP document itself. > > So, this definitely helps clarify that part for me. > > > > I understand the comment about honoring the parallelism config vs. not > > honoring the DAG level parallelism configurations. > > > > What I am not clear about is the how this “callback queue” maps to the > > existing queues already configured within the Airflow deployments, when > > multiple queues are configured. I can guess, but would prefer to have a > > definitive answer based on your thinking and implementation. > > This then leads to the updated configuration needed to take advantage of > > this feature. > > > > Best regards, > > Vikram > > > > On Fri, Feb 27, 2026 at 3:27 PM Ferruzzi, Dennis <[email protected]> > > wrote: > > > > > I think a lot of this has been discussed in dev calls but maybe never > got > > > documented anywhere for folks who aren't/weren't on the call. I > haven't > > > been very good at all about keeping the AIP up to date after it was > > > approved, or keeping community decisions and discussions easily > > > discoverable, and that's on me. Now that folks can see the feature > > > as-implemented, Amogh raised some good questions that are worth > > addressing > > > here so the rationale is captured somewhere persistent. I also want to > > flag > > > a couple of ideas that came out of that discussion as potential > follow-up > > > improvements. > > > > > > The plan has been that we treat a deadline callback as "finishing old > > > work" and therefore all callbacks get priority over new tasks. The > core > > > design principle here is that deadline callbacks need to be timely — > if a > > > deadline fires, the user expects to know about it promptly. A deadline > > > callback that gets stuck behind the same bottleneck it's trying to > alert > > > you about isn't useful. > > > > > > To that end, what I implemented was effectively two queues: > > > task_instances get queued by priority_weight (as they "always" have), > and > > > callbacks get queued in FIFO order. When the scheduler looks for new > > work > > > it will always fill slots from the callback queue first. Parallelism > is > > > honored, but pools and max_active_tasks_per_dag are ignored. The > reason > > > for that is twofold. Let's say a user has a Dag with a deadline at > > > 10-minutes. There are two cases where that deadline can be triggered: > > > > > > Case 1) This Dag is still running at the deadline, the callback > triggers > > - > > > maybe a Slack message that the report is still being generated and may > be > > > delayed today - while the Dag still runs. > > > > > > In this case, `pools` and `tasks_per_dag` may make sense as the dag is > > > still active, but I'd argue that in this case the deadline should > "break > > > through" that pool/task limit and execute regardless. For example, if > > the > > > issue is that the tasks stalled because the worker is hitting resource > > > constraints then tacking the deadline callback on behind that roadblock > > and > > > waiting for them to finish before alerting the user that there is a > > problem > > > defeats the intent of the Deadline. It is still bound by the > > > executor-level parallelism so it's not allowed to just run rampant, but > > it > > > isn't bound by the dag-level constraints. > > > > > > Case 2) The Dag failed at 2 minutes and the callback is triggered at > the > > > 2-minute point. At this point there is no `pool` or `max_tasks` to > > > consider. The task instances should have released their slots and > aren't > > > being counted toward the active tasks. Even if other Dags have started > > up > > > and are claiming pool slots, this callback isn't part of that Dag and > > > shouldn't be lumped with its tasks. > > > > > > A `max_executor_callbacks` setting which parallels max_executor_tasks > is > > > one idea that has come up. It's not a bad idea; I guess if the whole > > > building is burning down, you don't need to know that each floor is > > > burning. It feels a bit against the intention of the callbacks getting > > > prioritized over tasks, but if it's a user-defined option then that's > on > > > the user. It might be worth considering as a follow-up improvement if > > > anyone thinks it's something we really need. > > > > > > The other decision that seems to be contentious is that callbacks are > > FIFO > > > instead of implementing a `callback-priority-weight`. FIFO seems fine > > for > > > a callback queue and how I envision the feature being used, but maybe > > once > > > the feature gets in the hands of users they'll find a need for it. We > > as a > > > community have been saying for a while that there are way too many > > > user-facing knobs in the settings and this felt like a logical place > for > > us > > > to be opinionated in the code. With FIFO, the feature's behavior is > > > straightforward and predictable, and it's easier to add the weight > later > > if > > > there is demand than it would be to remove it later if we decide to > prune > > > the config options in a future version. > > > > > > For now, I think we're in a good place to ship with the current > behavior > > > and we can iterate based on real-world usage. I'll cut some Issues to > > make > > > sure the follow-up ideas from here and the PR are tracked so they don't > > get > > > lost. If anyone has concerns about the current behavior that should be > > > addressed before launch, let me know. > > > > > > > > > * > > > ferruzzi > > > > > >
