Re: [AIP-86] Deadline Callback queuing and priority

Ferruzzi, Dennis Tue, 10 Mar 2026 14:33:17 -0700

Currently the celery executor's implementation pushes all callbacks to the 
"default" queue unless the user defined a queue in the callback definition [1]. 
 We could add some logic to make it default to whichever queue spawned it, I 
suppose.... that may make more sense.  But the current behavior is "default 
unless you set one" which feels intuitive.



[1] 
https://github.com/apache/airflow/blob/ed237dff7c9c6ef6a25be34a243f5645ab0ccf67/providers/celery/src/airflow/providers/celery/executors/celery_executor.py#L173
________________________________
From: Amogh Desai <[email protected]>
Sent: Tuesday, March 10, 2026 3:08 AM
To: [email protected] <[email protected]>
Subject: RE: [EXT] [AIP-86] Deadline Callback queuing and priority

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.



Apologies for the delayed response, I was OOO last week.

Thanks for writing this up, Dennis -- really helpful to have the rationale
documented in one place.

Reasoning behind the "two collection" approach with callbacks having higher
priority over tasks makes sense to me.

I think shipping with FIFO and no `max_executor_callbacks` for now is fine
too.

Now, building on Vikram's qn, one thing that might be worth clarifying is:
in a
multi-queue executor setup (celery with multiple workers/queues), which
*queue*
do the callbacks get routed to? (what happens when the executor actually
dispatches
the callback to a worker) Is it always the default queue or is there some
sort
of affinity to the other queues? Might be worth documenting it even if the
answer
is a no brainer.


Thanks & Regards,
Amogh Desai


On Wed, Mar 4, 2026 at 5:42 AM Ferruzzi, Dennis <[email protected]> wrote:

> We are going to go round and round on the "I think I know what you mean,
> but..." carousel 😄
>
> I suspect you are thinking in terms of Airflow Queues like the Celery
> Queue where you can direct tasks to a specific Celery worker.  I was using
> "callback queue"  to refer to an internal data structure within the
> executor, not a user-facing Queue.  I could have easily called it a FIFO
> collection or list or whatever, and maybe I should have.
>
> Internally, the executor has two collections:  one stores the callbacks in
> FIFO order, and one stores the prioritized tasks.  When a slot opens up, it
> takes the top of the callback list if any are available, then fills the
> remaining slots from the list of tasks.  From the user's point of view it's
> one prioritized queue where all callbacks get top priority.
>
> I suspect there is confusion because I've somewhat overloaded the term
> Queue.  I'm not sure there is any configuration to update.
>
> - ferruzzi
> ________________________________
> From: Vikram Koka via dev <[email protected]>
> Sent: Tuesday, March 3, 2026 8:54 AM
> To: [email protected] <[email protected]>
> Cc: Vikram Koka <[email protected]>
> Subject: RE: [EXT] [AIP-86] Deadline Callback queuing and priority
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> Dennis,
>
> Thank you for writing this up.
>
> I completely missed this concept of “callback queue” as a parallel FIFO to
> the task queue during the AIP discussions and in the AIP document itself.
> So, this definitely helps clarify that part for me.
>
> I understand the comment about honoring the parallelism config vs. not
> honoring the DAG level parallelism configurations.
>
> What I am not clear about is the how this “callback queue” maps to the
> existing queues already configured within the Airflow deployments, when
> multiple queues are configured. I can guess, but would prefer to have a
> definitive answer based on your thinking and implementation.
> This then leads to the updated configuration needed to take advantage of
> this feature.
>
> Best regards,
> Vikram
>
> On Fri, Feb 27, 2026 at 3:27 PM Ferruzzi, Dennis <[email protected]>
> wrote:
>
> > I think a lot of this has been discussed in dev calls but maybe never got
> > documented anywhere for folks who aren't/weren't on the call.  I haven't
> > been very good at all about keeping the AIP up to date after it was
> > approved, or keeping community decisions and discussions easily
> > discoverable, and that's on me.  Now that folks can see the feature
> > as-implemented, Amogh raised some good questions that are worth
> addressing
> > here so the rationale is captured somewhere persistent. I also want to
> flag
> > a couple of ideas that came out of that discussion as potential follow-up
> > improvements.
> >
> > The plan has been that we treat a deadline callback as "finishing old
> > work" and therefore all callbacks get priority over new tasks.  The core
> > design principle here is that deadline callbacks need to be timely — if a
> > deadline fires, the user expects to know about it promptly. A deadline
> > callback that gets stuck behind the same bottleneck it's trying to alert
> > you about isn't useful.
> >
> > To that end, what I implemented was effectively two queues:
> > task_instances get queued by priority_weight (as they "always" have), and
> > callbacks get queued in FIFO order.  When the scheduler looks for new
> work
> > it will always fill slots from the callback queue first.  Parallelism is
> > honored, but pools and max_active_tasks_per_dag are ignored.  The reason
> > for that is twofold.  Let's say a user has a Dag with a deadline at
> > 10-minutes.  There are two cases where that deadline can be triggered:
> >
> > Case 1) This Dag is still running at the deadline, the callback triggers
> -
> > maybe a Slack message that the report is still being generated and may be
> > delayed today - while the Dag still runs.
> >
> > In this case, `pools` and `tasks_per_dag` may make sense as the dag is
> > still active, but I'd argue that in this case the deadline should "break
> > through" that pool/task limit and execute regardless.  For example, if
> the
> > issue is that the tasks stalled because the worker is hitting resource
> > constraints then tacking the deadline callback on behind that roadblock
> and
> > waiting for them to finish before alerting the user that there is a
> problem
> > defeats the intent of the Deadline.  It is still bound by the
> > executor-level parallelism so it's not allowed to just run rampant, but
> it
> > isn't bound by the dag-level constraints.
> >
> > Case 2) The Dag failed at 2 minutes and the callback is triggered at the
> > 2-minute point.  At this point there is no `pool` or `max_tasks` to
> > consider.  The task instances should have released their slots and aren't
> > being counted toward the active tasks.  Even if other Dags have started
> up
> > and are claiming pool slots, this callback isn't part of that Dag and
> > shouldn't be lumped with its tasks.
> >
> > A `max_executor_callbacks` setting which parallels max_executor_tasks is
> > one idea that has come up.  It's not a bad idea; I guess if the whole
> > building is burning down, you don't need to know that each floor is
> > burning. It feels a bit against the intention of the callbacks getting
> > prioritized over tasks, but if it's a user-defined option then that's on
> > the user. It might be worth considering as a follow-up improvement if
> > anyone thinks it's something we really need.
> >
> > The other decision that seems to be contentious is that callbacks are
> FIFO
> > instead of implementing a `callback-priority-weight`.  FIFO seems fine
> for
> > a callback queue and how I envision the feature being used, but maybe
> once
> > the feature gets in the hands of users they'll find a need for it.  We
> as a
> > community have been saying for a while that there are way too many
> > user-facing knobs in the settings and this felt like a logical place for
> us
> > to be opinionated in the code.  With FIFO, the feature's behavior is
> > straightforward and predictable, and it's easier to add the weight later
> if
> > there is demand than it would be to remove it later if we decide to prune
> > the config options in a future version.
> >
> > For now, I think we're in a good place to ship with the current behavior
> > and we can iterate based on real-world usage.  I'll cut some Issues to
> make
> > sure the follow-up ideas from here and the PR are tracked so they don't
> get
> > lost.  If anyone has concerns about the current behavior that should be
> > addressed before launch, let me know.
> >
> >
> >   *
> > ferruzzi
> >
>

Re: [AIP-86] Deadline Callback queuing and priority

Reply via email to