Re: [AIP-86] Deadline Callback queuing and priority

Amogh Desai Tue, 10 Mar 2026 22:58:31 -0700

Fair enough. Do we want to document that?

Vikram, does that answer your question?


Thanks & Regards,
Amogh Desai


On Wed, Mar 11, 2026 at 3:00 AM Ferruzzi, Dennis <[email protected]>
wrote:

> Currently the celery executor's implementation pushes all callbacks to the
> "default" queue unless the user defined a queue in the callback definition
> [1].  We could add some logic to make it default to whichever queue spawned
> it, I suppose.... that may make more sense.  But the current behavior is
> "default unless you set one" which feels intuitive.
>
>
> [1]
> https://github.com/apache/airflow/blob/ed237dff7c9c6ef6a25be34a243f5645ab0ccf67/providers/celery/src/airflow/providers/celery/executors/celery_executor.py#L173
> ________________________________
> From: Amogh Desai <[email protected]>
> Sent: Tuesday, March 10, 2026 3:08 AM
> To: [email protected] <[email protected]>
> Subject: RE: [EXT] [AIP-86] Deadline Callback queuing and priority
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> Apologies for the delayed response, I was OOO last week.
>
> Thanks for writing this up, Dennis -- really helpful to have the rationale
> documented in one place.
>
> Reasoning behind the "two collection" approach with callbacks having higher
> priority over tasks makes sense to me.
>
> I think shipping with FIFO and no `max_executor_callbacks` for now is fine
> too.
>
> Now, building on Vikram's qn, one thing that might be worth clarifying is:
> in a
> multi-queue executor setup (celery with multiple workers/queues), which
> *queue*
> do the callbacks get routed to? (what happens when the executor actually
> dispatches
> the callback to a worker) Is it always the default queue or is there some
> sort
> of affinity to the other queues? Might be worth documenting it even if the
> answer
> is a no brainer.
>
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Wed, Mar 4, 2026 at 5:42 AM Ferruzzi, Dennis <[email protected]>
> wrote:
>
> > We are going to go round and round on the "I think I know what you mean,
> > but..." carousel 😄
> >
> > I suspect you are thinking in terms of Airflow Queues like the Celery
> > Queue where you can direct tasks to a specific Celery worker.  I was
> using
> > "callback queue"  to refer to an internal data structure within the
> > executor, not a user-facing Queue.  I could have easily called it a FIFO
> > collection or list or whatever, and maybe I should have.
> >
> > Internally, the executor has two collections:  one stores the callbacks
> in
> > FIFO order, and one stores the prioritized tasks.  When a slot opens up,
> it
> > takes the top of the callback list if any are available, then fills the
> > remaining slots from the list of tasks.  From the user's point of view
> it's
> > one prioritized queue where all callbacks get top priority.
> >
> > I suspect there is confusion because I've somewhat overloaded the term
> > Queue.  I'm not sure there is any configuration to update.
> >
> > - ferruzzi
> > ________________________________
> > From: Vikram Koka via dev <[email protected]>
> > Sent: Tuesday, March 3, 2026 8:54 AM
> > To: [email protected] <[email protected]>
> > Cc: Vikram Koka <[email protected]>
> > Subject: RE: [EXT] [AIP-86] Deadline Callback queuing and priority
> >
> > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and
> know
> > the content is safe.
> >
> >
> >
> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne
> pouvez
> > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain
> que
> > le contenu ne présente aucun risque.
> >
> >
> >
> > Dennis,
> >
> > Thank you for writing this up.
> >
> > I completely missed this concept of “callback queue” as a parallel FIFO
> to
> > the task queue during the AIP discussions and in the AIP document itself.
> > So, this definitely helps clarify that part for me.
> >
> > I understand the comment about honoring the parallelism config vs. not
> > honoring the DAG level parallelism configurations.
> >
> > What I am not clear about is the how this “callback queue” maps to the
> > existing queues already configured within the Airflow deployments, when
> > multiple queues are configured. I can guess, but would prefer to have a
> > definitive answer based on your thinking and implementation.
> > This then leads to the updated configuration needed to take advantage of
> > this feature.
> >
> > Best regards,
> > Vikram
> >
> > On Fri, Feb 27, 2026 at 3:27 PM Ferruzzi, Dennis <[email protected]>
> > wrote:
> >
> > > I think a lot of this has been discussed in dev calls but maybe never
> got
> > > documented anywhere for folks who aren't/weren't on the call.  I
> haven't
> > > been very good at all about keeping the AIP up to date after it was
> > > approved, or keeping community decisions and discussions easily
> > > discoverable, and that's on me.  Now that folks can see the feature
> > > as-implemented, Amogh raised some good questions that are worth
> > addressing
> > > here so the rationale is captured somewhere persistent. I also want to
> > flag
> > > a couple of ideas that came out of that discussion as potential
> follow-up
> > > improvements.
> > >
> > > The plan has been that we treat a deadline callback as "finishing old
> > > work" and therefore all callbacks get priority over new tasks.  The
> core
> > > design principle here is that deadline callbacks need to be timely —
> if a
> > > deadline fires, the user expects to know about it promptly. A deadline
> > > callback that gets stuck behind the same bottleneck it's trying to
> alert
> > > you about isn't useful.
> > >
> > > To that end, what I implemented was effectively two queues:
> > > task_instances get queued by priority_weight (as they "always" have),
> and
> > > callbacks get queued in FIFO order.  When the scheduler looks for new
> > work
> > > it will always fill slots from the callback queue first.  Parallelism
> is
> > > honored, but pools and max_active_tasks_per_dag are ignored.  The
> reason
> > > for that is twofold.  Let's say a user has a Dag with a deadline at
> > > 10-minutes.  There are two cases where that deadline can be triggered:
> > >
> > > Case 1) This Dag is still running at the deadline, the callback
> triggers
> > -
> > > maybe a Slack message that the report is still being generated and may
> be
> > > delayed today - while the Dag still runs.
> > >
> > > In this case, `pools` and `tasks_per_dag` may make sense as the dag is
> > > still active, but I'd argue that in this case the deadline should
> "break
> > > through" that pool/task limit and execute regardless.  For example, if
> > the
> > > issue is that the tasks stalled because the worker is hitting resource
> > > constraints then tacking the deadline callback on behind that roadblock
> > and
> > > waiting for them to finish before alerting the user that there is a
> > problem
> > > defeats the intent of the Deadline.  It is still bound by the
> > > executor-level parallelism so it's not allowed to just run rampant, but
> > it
> > > isn't bound by the dag-level constraints.
> > >
> > > Case 2) The Dag failed at 2 minutes and the callback is triggered at
> the
> > > 2-minute point.  At this point there is no `pool` or `max_tasks` to
> > > consider.  The task instances should have released their slots and
> aren't
> > > being counted toward the active tasks.  Even if other Dags have started
> > up
> > > and are claiming pool slots, this callback isn't part of that Dag and
> > > shouldn't be lumped with its tasks.
> > >
> > > A `max_executor_callbacks` setting which parallels max_executor_tasks
> is
> > > one idea that has come up.  It's not a bad idea; I guess if the whole
> > > building is burning down, you don't need to know that each floor is
> > > burning. It feels a bit against the intention of the callbacks getting
> > > prioritized over tasks, but if it's a user-defined option then that's
> on
> > > the user. It might be worth considering as a follow-up improvement if
> > > anyone thinks it's something we really need.
> > >
> > > The other decision that seems to be contentious is that callbacks are
> > FIFO
> > > instead of implementing a `callback-priority-weight`.  FIFO seems fine
> > for
> > > a callback queue and how I envision the feature being used, but maybe
> > once
> > > the feature gets in the hands of users they'll find a need for it.  We
> > as a
> > > community have been saying for a while that there are way too many
> > > user-facing knobs in the settings and this felt like a logical place
> for
> > us
> > > to be opinionated in the code.  With FIFO, the feature's behavior is
> > > straightforward and predictable, and it's easier to add the weight
> later
> > if
> > > there is demand than it would be to remove it later if we decide to
> prune
> > > the config options in a future version.
> > >
> > > For now, I think we're in a good place to ship with the current
> behavior
> > > and we can iterate based on real-world usage.  I'll cut some Issues to
> > make
> > > sure the follow-up ideas from here and the PR are tracked so they don't
> > get
> > > lost.  If anyone has concerns about the current behavior that should be
> > > addressed before launch, let me know.
> > >
> > >
> > >   *
> > > ferruzzi
> > >
> >
>

Re: [AIP-86] Deadline Callback queuing and priority

Reply via email to