Re: [DISCUSS] Deadline Alert Callbacks

Ash Berlin-Taylor Thu, 22 May 2025 08:17:26 -0700

Yeah, that’s how I understood it to mean, and then the callback would go to a 
worker (or a trigger) to run the user code.


-a

> On 22 May 2025, at 16:06, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
>> Option 1 as originally proposed in this thread only does the "is a
> callback
> required" check in scheduler -- not running the callback in scheduler
> 
> Ah - then OK. I thought it's also callback execution. Just checking is fine
> in scheduler.
> 
> On Thu, May 22, 2025 at 4:55 PM Daniel Standish
> <daniel.stand...@astronomer.io.invalid> wrote:
> 
>> Option 1 as originally proposed in this thread only does the "is a callback
>> required" check in scheduler -- not running the callback in scheduler.
>> 
>> On Thu, May 22, 2025 at 7:22 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>> 
>>>> So I very strongly vote for Option 1, and if needed make the scheduler
>>> itself more resilient. The Airflow Scheduler _IS_ airflow. Let’s do what
>> we
>>> need to in order to make it more stable, rather than working around a
>>> problem of our own making, whilst also making it operationally more
>> complex
>>> to run.
>>> 
>>> Hey Ash - I forgot to add. Option 1 is against our new security model.
>> This
>>> is essentially DAG author code executed in the scheduler. Ash - do you
>>> think it is possible to avoid that ? For DAG parsing it resulted with
>>> mandatory dag-processor command separated from scheduler, so I am not
>> sure
>>> how we would solve the security issue here? Or maybe there is another
>> idea
>>> on how to solve it? That would be possible if we had deadline callbacks
>>> defined in the plugins, but again - I think the idea was to be able to
>>> provide callbacks by DAG authors (which IMHO is synonymous with "we do
>> not
>>> run it in scheduler".
>>> 
>>> We could potentially run the callbacks in the Dag processor (which we
>>> already did BTW). but I am not sure if this is what we want.
>>> 
>>> J.
>>> 
>>> 
>>> On Thu, May 22, 2025 at 3:40 PM Elad Kalif <elad...@apache.org> wrote:
>>> 
>>>> My comment on the name is for the suggested component that runs the
>>>> workload. It's not about the feature itself. I just suggest a more
>>> generic
>>>> name so if the need comes it would be easier to execute different kind
>> of
>>>> workloads on it (like callbacks).
>>>> 
>>>> As for reuse the Triggerer I am not a fan of that. It serve a
>> completely
>>>> different porpuse and combining both cases may result in poor usage of
>>> auto
>>>> scaling. I don't think alerts/callbacks/other "misc" should compete on
>>> the
>>>> same resources as actual tasks.
>>>> 
>>>> בתאריך יום ה׳, 22 במאי 2025, 16:19, מאת Jarek Potiuk ‏<
>> ja...@potiuk.com
>>>> :
>>>> 
>>>>> How about Option 3) making it part of triggerer.
>>>>> 
>>>>> I think that goes in the direction we've been discussing in the past
>>>> where
>>>>> we have 'generic workload" that we can submit from any of the other
>>>>> components that will be executed in triggerer.
>>>>> 
>>>>> * that would not add too much complexity - no extra process to manage
>>>>> * triggerer is obligatory part of installation now anyway
>>>>> * usually machines today have more processors and triggerer, with its
>>>> event
>>>>> loop does not seem to be too busy in terms of multi-processor usage
>>>> (there
>>>>> are extra processes accessing the DB but still not much I think). It
>>>> could
>>>>> fork another process to run just deadline checks.
>>>>> * re - multi-team it's even easier, triggerer is already going to be
>>>>> "per-team".
>>>>> * we could even rename triggerer to "generic workload processor"
>> (well
>>>>> shorter name, but to indicate that it could process any kind of
>>>> workloads -
>>>>> not only deferred triggers).
>>>>> 
>>>>> Re: comments from Elad:
>>>>> 
>>>>> 1) Naming wise: I think we settled on the name already (looong
>>>> discussion,
>>>>> naming is hard) and I think the scope of it is just really
>> "deadlines"
>>>> (we
>>>>> also wanted to distinguish it from SLA) - i like the name for this
>>>>> particular callback type, but yes - I agree it should be more
>> generic,
>>>> open
>>>>> for any future types of callbacks. If we go for triggerer handling
>>>> "generic
>>>>> workload" - that is IMHO "generic enough" to handle any future
>>> workloads
>>>>> 
>>>>> 2) I believe this is something that could be handled by the callback.
>>>>> Callback could have the option to be able to submit "cancel" request
>>> for
>>>>> the task it is called back for (via task.sdk API)  - but that should
>> be
>>>> up
>>>>> to the one who writes the callback.
>>>>> 
>>>>> J.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, May 22, 2025 at 10:03 AM Elad Kalif <elad...@apache.org>
>>> wrote:
>>>>> 
>>>>>> I prefer option 2 but I have questions.
>>>>>> 1. Naming wise maybe we should prefer a more generic name as I am
>> not
>>>>> sure
>>>>>> if it should be limited to deadlines? (maybe should be shared with
>>>>>> executing callbacks?)
>>>>>> 2. How do you plan to manage the queue of alerts? What happens if
>> the
>>>>>> process is unhealthy while workers continue to execute tasks?
>>>>>> 
>>>>>> On Thu, May 22, 2025 at 12:56 AM Ryan Hatter
>>>>>> <ryan.hat...@astronomer.io.invalid> wrote:
>>>>>> 
>>>>>>> +1 for option 2, primarily because of:
>>>>>>> 
>>>>>>> It would be more robust and resilient, and therefore be able to
>>> run
>>>>> the
>>>>>>>> callbacks *even in presence of certain kinds of issues like the
>>>>>> scheduler
>>>>>>>> being bogged-down*
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, May 21, 2025 at 5:09 PM Kataria, Ramit
>>>>>> <ramit...@amazon.com.invalid
>>>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I’m working with Dennis on Deadline Alerts (AIP-86). I'd like
>> to
>>>>>> discuss
>>>>>>>> implementation approaches for executing callbacks when Deadline
>>>>> Alerts
>>>>>>> are
>>>>>>>> triggered. As you may know, the old SLA feature has been
>> removed,
>>>> and
>>>>>>> we're
>>>>>>>> planning to introduce Deadline Alerts as a replacement in 3.1.
>>>> When a
>>>>>>>> deadline is missed, we need a mechanism to execute callbacks
>>> (which
>>>>>> could
>>>>>>>> be notifications or other actions).
>>>>>>>> 
>>>>>>>> I’ve identified two main approaches:
>>>>>>>> 
>>>>>>>> Option 1: Scheduler-based
>>>>>>>> In this approach, the scheduler would check on a regular
>> interval
>>>> to
>>>>>> see
>>>>>>>> if the earliest deadline has passed and then queue the callback
>>> to
>>>>> run
>>>>>> in
>>>>>>>> an executor (local or remote). The executor would be specified
>>> when
>>>>>>>> creating the deadline alert and if there’s none specified, then
>>> the
>>>>>>> default
>>>>>>>> executor would be used.
>>>>>>>> 
>>>>>>>> Option 2: New DeadlineProcessor process
>>>>>>>> In this approach, there would be a new process similar to
>>>>>>>> triggerer/dag-processor completely independent from the
>> scheduler
>>>> to
>>>>>>> check
>>>>>>>> for deadlines on a regular interval and also run the callbacks
>>>>> without
>>>>>>>> queueing it in another executor.
>>>>>>>> 
>>>>>>>> Multi-team considerations: For multi-team later this year,
>>> option 2
>>>>>> would
>>>>>>>> be relatively simple to implement. However, for option 1, the
>>>>> callbacks
>>>>>>>> would have to run on a remote executor since there would be no
>>>> local
>>>>>>>> executor.
>>>>>>>> 
>>>>>>>> I recommend going with option 2 because:
>>>>>>>> 
>>>>>>>>  *   It would be more robust and resilient, and therefore be
>>> able
>>>> to
>>>>>> run
>>>>>>>> the callbacks even in presence of certain kinds of issues like
>>> the
>>>>>>>> scheduler being bogged-down
>>>>>>>>  *   It would also run the callbacks almost instantly instead
>> of
>>>>>> having
>>>>>>>> to wait for an executor (especially if there’s a long queue of
>>>> tasks
>>>>>> or a
>>>>>>>> cold-start delay)
>>>>>>>>     *   This could be mitigated by implementing a priority
>>> system
>>>>>> where
>>>>>>>> the deadline callbacks are prioritized over regular tasks but
>>> this
>>>>> is a
>>>>>>>> non-trivial problem with my current understanding of Airflow’s
>>>>>>> architecture
>>>>>>>>  *   It would avoid a potential slight increase in workload
>> for
>>>> the
>>>>>>>> scheduler
>>>>>>>>     *   The additional workload in the scheduler for option 1
>>>> would
>>>>> be
>>>>>>>> checking to see if the earliest deadline has passed on a
>> regular
>>>>>> interval
>>>>>>>> 
>>>>>>>> However, it would introduce another process for admins to
>> deploy
>>>> and
>>>>>>>> manage, and also likely require more effort to implement,
>>> therefore
>>>>>>> taking
>>>>>>>> longer to complete.
>>>>>>>> 
>>>>>>>> So, I’d like to hear your thoughts on these approaches,
>> anything
>>> I
>>>>> may
>>>>>>>> have missed and if you agree/disagree with this direction.
>> Thank
>>>> you
>>>>>> for
>>>>>>>> your input!
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> 
>>>>>>>> Ramit Kataria
>>>>>>>> SDE at AWS
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Re: [DISCUSS] Deadline Alert Callbacks

Reply via email to