+1 for option 2, primarily because of:

 It would be more robust and resilient, and therefore be able to run the
> callbacks *even in presence of certain kinds of issues like the scheduler
> being bogged-down*


On Wed, May 21, 2025 at 5:09 PM Kataria, Ramit <ramit...@amazon.com.invalid>
wrote:

> Hi all,
>
> I’m working with Dennis on Deadline Alerts (AIP-86). I'd like to discuss
> implementation approaches for executing callbacks when Deadline Alerts are
> triggered. As you may know, the old SLA feature has been removed, and we're
> planning to introduce Deadline Alerts as a replacement in 3.1. When a
> deadline is missed, we need a mechanism to execute callbacks (which could
> be notifications or other actions).
>
> I’ve identified two main approaches:
>
> Option 1: Scheduler-based
> In this approach, the scheduler would check on a regular interval to see
> if the earliest deadline has passed and then queue the callback to run in
> an executor (local or remote). The executor would be specified when
> creating the deadline alert and if there’s none specified, then the default
> executor would be used.
>
> Option 2: New DeadlineProcessor process
> In this approach, there would be a new process similar to
> triggerer/dag-processor completely independent from the scheduler to check
> for deadlines on a regular interval and also run the callbacks without
> queueing it in another executor.
>
> Multi-team considerations: For multi-team later this year, option 2 would
> be relatively simple to implement. However, for option 1, the callbacks
> would have to run on a remote executor since there would be no local
> executor.
>
> I recommend going with option 2 because:
>
>   *   It would be more robust and resilient, and therefore be able to run
> the callbacks even in presence of certain kinds of issues like the
> scheduler being bogged-down
>   *   It would also run the callbacks almost instantly instead of having
> to wait for an executor (especially if there’s a long queue of tasks or a
> cold-start delay)
>      *   This could be mitigated by implementing a priority system where
> the deadline callbacks are prioritized over regular tasks but this is a
> non-trivial problem with my current understanding of Airflow’s architecture
>   *   It would avoid a potential slight increase in workload for the
> scheduler
>      *   The additional workload in the scheduler for option 1 would be
> checking to see if the earliest deadline has passed on a regular interval
>
> However, it would introduce another process for admins to deploy and
> manage, and also likely require more effort to implement, therefore taking
> longer to complete.
>
> So, I’d like to hear your thoughts on these approaches, anything I may
> have missed and if you agree/disagree with this direction. Thank you for
> your input!
>
>
> Best,
>
> Ramit Kataria
> SDE at AWS
>

Reply via email to