Yeah, that’s how I understood it to mean, and then the callback would go to a worker (or a trigger) to run the user code.
-a > On 22 May 2025, at 16:06, Jarek Potiuk <ja...@potiuk.com> wrote: > >> Option 1 as originally proposed in this thread only does the "is a > callback > required" check in scheduler -- not running the callback in scheduler > > Ah - then OK. I thought it's also callback execution. Just checking is fine > in scheduler. > > On Thu, May 22, 2025 at 4:55 PM Daniel Standish > <daniel.stand...@astronomer.io.invalid> wrote: > >> Option 1 as originally proposed in this thread only does the "is a callback >> required" check in scheduler -- not running the callback in scheduler. >> >> On Thu, May 22, 2025 at 7:22 AM Jarek Potiuk <ja...@potiuk.com> wrote: >> >>>> So I very strongly vote for Option 1, and if needed make the scheduler >>> itself more resilient. The Airflow Scheduler _IS_ airflow. Let’s do what >> we >>> need to in order to make it more stable, rather than working around a >>> problem of our own making, whilst also making it operationally more >> complex >>> to run. >>> >>> Hey Ash - I forgot to add. Option 1 is against our new security model. >> This >>> is essentially DAG author code executed in the scheduler. Ash - do you >>> think it is possible to avoid that ? For DAG parsing it resulted with >>> mandatory dag-processor command separated from scheduler, so I am not >> sure >>> how we would solve the security issue here? Or maybe there is another >> idea >>> on how to solve it? That would be possible if we had deadline callbacks >>> defined in the plugins, but again - I think the idea was to be able to >>> provide callbacks by DAG authors (which IMHO is synonymous with "we do >> not >>> run it in scheduler". >>> >>> We could potentially run the callbacks in the Dag processor (which we >>> already did BTW). but I am not sure if this is what we want. >>> >>> J. >>> >>> >>> On Thu, May 22, 2025 at 3:40 PM Elad Kalif <elad...@apache.org> wrote: >>> >>>> My comment on the name is for the suggested component that runs the >>>> workload. It's not about the feature itself. I just suggest a more >>> generic >>>> name so if the need comes it would be easier to execute different kind >> of >>>> workloads on it (like callbacks). >>>> >>>> As for reuse the Triggerer I am not a fan of that. It serve a >> completely >>>> different porpuse and combining both cases may result in poor usage of >>> auto >>>> scaling. I don't think alerts/callbacks/other "misc" should compete on >>> the >>>> same resources as actual tasks. >>>> >>>> בתאריך יום ה׳, 22 במאי 2025, 16:19, מאת Jarek Potiuk < >> ja...@potiuk.com >>>> : >>>> >>>>> How about Option 3) making it part of triggerer. >>>>> >>>>> I think that goes in the direction we've been discussing in the past >>>> where >>>>> we have 'generic workload" that we can submit from any of the other >>>>> components that will be executed in triggerer. >>>>> >>>>> * that would not add too much complexity - no extra process to manage >>>>> * triggerer is obligatory part of installation now anyway >>>>> * usually machines today have more processors and triggerer, with its >>>> event >>>>> loop does not seem to be too busy in terms of multi-processor usage >>>> (there >>>>> are extra processes accessing the DB but still not much I think). It >>>> could >>>>> fork another process to run just deadline checks. >>>>> * re - multi-team it's even easier, triggerer is already going to be >>>>> "per-team". >>>>> * we could even rename triggerer to "generic workload processor" >> (well >>>>> shorter name, but to indicate that it could process any kind of >>>> workloads - >>>>> not only deferred triggers). >>>>> >>>>> Re: comments from Elad: >>>>> >>>>> 1) Naming wise: I think we settled on the name already (looong >>>> discussion, >>>>> naming is hard) and I think the scope of it is just really >> "deadlines" >>>> (we >>>>> also wanted to distinguish it from SLA) - i like the name for this >>>>> particular callback type, but yes - I agree it should be more >> generic, >>>> open >>>>> for any future types of callbacks. If we go for triggerer handling >>>> "generic >>>>> workload" - that is IMHO "generic enough" to handle any future >>> workloads >>>>> >>>>> 2) I believe this is something that could be handled by the callback. >>>>> Callback could have the option to be able to submit "cancel" request >>> for >>>>> the task it is called back for (via task.sdk API) - but that should >> be >>>> up >>>>> to the one who writes the callback. >>>>> >>>>> J. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, May 22, 2025 at 10:03 AM Elad Kalif <elad...@apache.org> >>> wrote: >>>>> >>>>>> I prefer option 2 but I have questions. >>>>>> 1. Naming wise maybe we should prefer a more generic name as I am >> not >>>>> sure >>>>>> if it should be limited to deadlines? (maybe should be shared with >>>>>> executing callbacks?) >>>>>> 2. How do you plan to manage the queue of alerts? What happens if >> the >>>>>> process is unhealthy while workers continue to execute tasks? >>>>>> >>>>>> On Thu, May 22, 2025 at 12:56 AM Ryan Hatter >>>>>> <ryan.hat...@astronomer.io.invalid> wrote: >>>>>> >>>>>>> +1 for option 2, primarily because of: >>>>>>> >>>>>>> It would be more robust and resilient, and therefore be able to >>> run >>>>> the >>>>>>>> callbacks *even in presence of certain kinds of issues like the >>>>>> scheduler >>>>>>>> being bogged-down* >>>>>>> >>>>>>> >>>>>>> On Wed, May 21, 2025 at 5:09 PM Kataria, Ramit >>>>>> <ramit...@amazon.com.invalid >>>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I’m working with Dennis on Deadline Alerts (AIP-86). I'd like >> to >>>>>> discuss >>>>>>>> implementation approaches for executing callbacks when Deadline >>>>> Alerts >>>>>>> are >>>>>>>> triggered. As you may know, the old SLA feature has been >> removed, >>>> and >>>>>>> we're >>>>>>>> planning to introduce Deadline Alerts as a replacement in 3.1. >>>> When a >>>>>>>> deadline is missed, we need a mechanism to execute callbacks >>> (which >>>>>> could >>>>>>>> be notifications or other actions). >>>>>>>> >>>>>>>> I’ve identified two main approaches: >>>>>>>> >>>>>>>> Option 1: Scheduler-based >>>>>>>> In this approach, the scheduler would check on a regular >> interval >>>> to >>>>>> see >>>>>>>> if the earliest deadline has passed and then queue the callback >>> to >>>>> run >>>>>> in >>>>>>>> an executor (local or remote). The executor would be specified >>> when >>>>>>>> creating the deadline alert and if there’s none specified, then >>> the >>>>>>> default >>>>>>>> executor would be used. >>>>>>>> >>>>>>>> Option 2: New DeadlineProcessor process >>>>>>>> In this approach, there would be a new process similar to >>>>>>>> triggerer/dag-processor completely independent from the >> scheduler >>>> to >>>>>>> check >>>>>>>> for deadlines on a regular interval and also run the callbacks >>>>> without >>>>>>>> queueing it in another executor. >>>>>>>> >>>>>>>> Multi-team considerations: For multi-team later this year, >>> option 2 >>>>>> would >>>>>>>> be relatively simple to implement. However, for option 1, the >>>>> callbacks >>>>>>>> would have to run on a remote executor since there would be no >>>> local >>>>>>>> executor. >>>>>>>> >>>>>>>> I recommend going with option 2 because: >>>>>>>> >>>>>>>> * It would be more robust and resilient, and therefore be >>> able >>>> to >>>>>> run >>>>>>>> the callbacks even in presence of certain kinds of issues like >>> the >>>>>>>> scheduler being bogged-down >>>>>>>> * It would also run the callbacks almost instantly instead >> of >>>>>> having >>>>>>>> to wait for an executor (especially if there’s a long queue of >>>> tasks >>>>>> or a >>>>>>>> cold-start delay) >>>>>>>> * This could be mitigated by implementing a priority >>> system >>>>>> where >>>>>>>> the deadline callbacks are prioritized over regular tasks but >>> this >>>>> is a >>>>>>>> non-trivial problem with my current understanding of Airflow’s >>>>>>> architecture >>>>>>>> * It would avoid a potential slight increase in workload >> for >>>> the >>>>>>>> scheduler >>>>>>>> * The additional workload in the scheduler for option 1 >>>> would >>>>> be >>>>>>>> checking to see if the earliest deadline has passed on a >> regular >>>>>> interval >>>>>>>> >>>>>>>> However, it would introduce another process for admins to >> deploy >>>> and >>>>>>>> manage, and also likely require more effort to implement, >>> therefore >>>>>>> taking >>>>>>>> longer to complete. >>>>>>>> >>>>>>>> So, I’d like to hear your thoughts on these approaches, >> anything >>> I >>>>> may >>>>>>>> have missed and if you agree/disagree with this direction. >> Thank >>>> you >>>>>> for >>>>>>>> your input! >>>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Ramit Kataria >>>>>>>> SDE at AWS >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org