syun64 commented on PR #8545:
URL: https://github.com/apache/airflow/pull/8545#issuecomment-1491091885

   Hi everyone, I've spent a lot of time collecting all reported concerns that 
the community has had regarding SLAs to date. After much deliberation, I've 
reached the conclusion that we might be better off defining the Airflow-native 
SLA feature only at the DAG level, where it can be supported to users' 
expectations in a first-class way, and leave the task-level SLA definition to 
the users.  There are three main reasons to why I think task-level SLAs should 
be implemented by the users instead of by Airflow. 
   1. Today, users have the ability to monitor Task-level SLAs through the use 
of Deferrable Operators and Asynchronous DateTimeTriggers (and Task groups to 
organize these tasks on the UI).
   2. Reliably tracking task-level SLAs when the task actually misses the SLA 
(instead of only after the task succeeds) is only possible at the expense of 
overloading the work of the scheduler with task-level SLA detection - which is 
not ideal because task-level SLA detection is not the primary function of a 
scheduler, and it wouldn't be beneficial for Airflow users to compromise the 
scheduler in any way.
   3. Some users want to customize the way they monitor the Task-level SLAs. 
Some want to use different definitions of the timedelta (timedelta from dagrun 
start versus from task start), some want to detect task SLA misses multiple 
times (different levels of warning for delays), and some users want to detect 
the SLA miss only if the target task is in a certain state (unfinished state - 
RUNNING, finished state- SUCCESS/SKIPPED)
   
   In contrast, I believe DAG-level SLA will strictly be a positive feature. It 
will increase the general reliability of Airflow DAGs and even be able to alert 
us on job delays when [undefined behaviors 
happen](https://github.com/apache/airflow/issues/21225), all without negatively 
impacting the performance of the scheduler.
   
   If you have been interested in the SLA mechanism, or have been actively 
using the current version of the SLA mechanism, I would love to get your 
feedback on this proposal. I would love to work with you to try to come up with 
an SLA solution that meets user expectations!
   
   [Airflow Improvement 
Proposal](https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-57+SLA+as+a+DAG-Level+Feature)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to