o-nikolas commented on code in PR #53727: URL: https://github.com/apache/airflow/pull/53727#discussion_r2248966052
########## airflow-core/docs/howto/deadline-alerts.rst: ########## @@ -0,0 +1,263 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + + +Deadline Alerts +=============== + +The :class:`~airflow.sdk.definitions.deadline.DeadlineAlert` feature is the next evolution of +the old SLA. Deadline Alerts allow you to set time thresholds for your DAG runs and automatically +respond when those thresholds are exceeded. You can set up Deadline Alerts by choosing a built-in +reference point, setting an interval, and defining a response using either Airflow's Notifiers or +a custom callback function. + +Creating a Deadline Alert +------------------------- + +To create a Deadline Alert, there are three three components you must specify and one optional one: + +* Reference: When to start counting from +* Interval: How far before or after the reference point to trigger the alert +* Callback: What to do when the deadline is exceeded +* Callback Kwargs: Optional values to pass to the Callback when it is run + +Here is how Deadlines are calculated: + +:: + + [Reference] ------ [Interval] ------> [Deadline] + ^ ^ + | | + Start time Trigger point + +Below is an example DAG implementation. If the DAG has not finished 15 minutes after it was queued, send an email: + +.. code-block:: python + + from datetime import datetime, timedelta + from airflow import DAG + from airflow.sdk.definitions.deadline import DeadlineAlert, DeadlineReference + from airflow.providers.smtp.notifications.smtp import SmtpNotifier + from airflow.providers.standard.operators.empty import EmptyOperator + + with DAG( + dag_id="deadline_alert_example", + deadline=DeadlineAlert( + reference=DeadlineReference.DAGRUN_QUEUED_AT, + interval=timedelta(minutes=15), + callback=SmtpNotifier( + to="[email protected]", + subject="[Alert] DAG {{ dag.dag_id }} exceeded time threshold", + html_content="The DAG has been running for more than 15 minutes since being queued.", + ), + ), + ): + EmptyOperator(task_id="example_task") + +The timeline for this example would look like this: + +:: + + |------|-----------|---------|-----------|--------| + Scheduled Queued Started Deadline + 00:00 00:03 00:05 00:18 + +Using Built-in References +------------------------- + +Airflow provides several built-in reference points that you can use with DeadlineAlert: + +``DeadlineReference.DAGRUN_QUEUED_AT`` + Measures time from when the DagRun was queued. Useful for monitoring resource constraints. + +``DeadlineReference.DAGRUN_LOGICAL_DATE`` + References when the DAG run was scheduled to start. For example, setting an interval of + ``timedelta(minutes=15)`` would trigger the alert if the DAG hasn't completed 15 minutes + after it was scheduled to start, regardless of when (or if) it actually began executing. + Useful for ensuring scheduled DAGs complete before their next scheduled run. + +``DeadlineReference.FIXED_DATETIME`` + Specifies a fixed point in time. Useful when DAGs must complete by a specific time. + +Here's an example using a fixed datetime: + +.. code-block:: python + + tomorrow_at_ten = datetime.combine(datetime.now().date() + timedelta(days=1), time(10, 0)) + + with DAG( + dag_id="fixed_deadline_alert", + deadline=DeadlineAlert( + reference=DeadlineReference.FIXED_DATETIME(tomorrow_at_ten), + interval=timedelta(minutes=-30), # Alert 30 minutes before the reference. + callback=SmtpNotifier( + to="[email protected]", + subject="Report will be late", + html_content="The report will not be ready 30 minutes before the deadline.", + ), + ), + ): + EmptyOperator(task_id="example_task") + +The timeline for this example would look like this: + +:: + + |------|----------|---------|------------|--------| + Queued Start Deadline Reference + 09:15 09:17 09:30 10:00 + +.. note:: + Note that since the interval is a negative value, the deadline is before the reference in this case. + +Using Callbacks +--------------- + +When a deadline is exceeded, the callback is executed. You can use any async :doc:`Notifier </howto/notifications>` Review Comment: ```suggestion When a deadline is exceeded, the callback is executed. You can use any :doc:`Notifier </howto/notifications>` ``` Using sync or async is the goal right? Did you specifically mention async here because the sync path is still in progress? If so I'd just do something like: ```suggestion When a deadline is exceeded, the callback is executed. You can use any :doc:`Notifier </howto/notifications>` (Note: that synchronous callback support is in development) ``` ########## airflow-core/docs/howto/deadline-alerts.rst: ########## @@ -0,0 +1,263 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + + +Deadline Alerts +=============== + +The :class:`~airflow.sdk.definitions.deadline.DeadlineAlert` feature is the next evolution of +the old SLA. Deadline Alerts allow you to set time thresholds for your DAG runs and automatically +respond when those thresholds are exceeded. You can set up Deadline Alerts by choosing a built-in +reference point, setting an interval, and defining a response using either Airflow's Notifiers or +a custom callback function. + +Creating a Deadline Alert +------------------------- + +To create a Deadline Alert, there are three three components you must specify and one optional one: + +* Reference: When to start counting from +* Interval: How far before or after the reference point to trigger the alert +* Callback: What to do when the deadline is exceeded +* Callback Kwargs: Optional values to pass to the Callback when it is run + +Here is how Deadlines are calculated: + +:: + + [Reference] ------ [Interval] ------> [Deadline] + ^ ^ + | | + Start time Trigger point + +Below is an example DAG implementation. If the DAG has not finished 15 minutes after it was queued, send an email: + +.. code-block:: python + + from datetime import datetime, timedelta + from airflow import DAG + from airflow.sdk.definitions.deadline import DeadlineAlert, DeadlineReference + from airflow.providers.smtp.notifications.smtp import SmtpNotifier + from airflow.providers.standard.operators.empty import EmptyOperator + + with DAG( + dag_id="deadline_alert_example", + deadline=DeadlineAlert( + reference=DeadlineReference.DAGRUN_QUEUED_AT, + interval=timedelta(minutes=15), + callback=SmtpNotifier( + to="[email protected]", + subject="[Alert] DAG {{ dag.dag_id }} exceeded time threshold", + html_content="The DAG has been running for more than 15 minutes since being queued.", + ), + ), + ): + EmptyOperator(task_id="example_task") + +The timeline for this example would look like this: + +:: + + |------|-----------|---------|-----------|--------| + Scheduled Queued Started Deadline + 00:00 00:03 00:05 00:18 + +Using Built-in References +------------------------- + +Airflow provides several built-in reference points that you can use with DeadlineAlert: + +``DeadlineReference.DAGRUN_QUEUED_AT`` + Measures time from when the DagRun was queued. Useful for monitoring resource constraints. + +``DeadlineReference.DAGRUN_LOGICAL_DATE`` + References when the DAG run was scheduled to start. For example, setting an interval of + ``timedelta(minutes=15)`` would trigger the alert if the DAG hasn't completed 15 minutes + after it was scheduled to start, regardless of when (or if) it actually began executing. + Useful for ensuring scheduled DAGs complete before their next scheduled run. + +``DeadlineReference.FIXED_DATETIME`` + Specifies a fixed point in time. Useful when DAGs must complete by a specific time. + +Here's an example using a fixed datetime: + +.. code-block:: python + + tomorrow_at_ten = datetime.combine(datetime.now().date() + timedelta(days=1), time(10, 0)) + + with DAG( + dag_id="fixed_deadline_alert", + deadline=DeadlineAlert( + reference=DeadlineReference.FIXED_DATETIME(tomorrow_at_ten), + interval=timedelta(minutes=-30), # Alert 30 minutes before the reference. + callback=SmtpNotifier( + to="[email protected]", + subject="Report will be late", + html_content="The report will not be ready 30 minutes before the deadline.", + ), + ), + ): + EmptyOperator(task_id="example_task") + +The timeline for this example would look like this: + +:: + + |------|----------|---------|------------|--------| + Queued Start Deadline Reference + 09:15 09:17 09:30 10:00 + +.. note:: + Note that since the interval is a negative value, the deadline is before the reference in this case. + +Using Callbacks +--------------- + +When a deadline is exceeded, the callback is executed. You can use any async :doc:`Notifier </howto/notifications>` +or create a custom callback function. + +Using Built-in Notifiers +^^^^^^^^^^^^^^^^^^^^^^^^ + +Here's an example using the Slack notifier if the DagRun has not finished within 30 minutes of it being queued: + +.. code-block:: python + + with DAG( + dag_id="slack_deadline_alert", + deadline=DeadlineAlert( + reference=DeadlineReference.DAGRUN_QUEUED_AT, + interval=timedelta(minutes=30), + callback=SlackNotifier( + slack_conn_id="slack_default", + channel="#alerts", + text="DAG {{ dag.dag_id }} has been running for more than 30 minutes since being queued.", + username="Airflow Alerts", + ), + ), + ): + EmptyOperator(task_id="example_task") + +Creating Custom Callbacks +^^^^^^^^^^^^^^^^^^^^^^^^^ + +You can create custom callbacks for more complex handling. The ``callback_kwargs`` specified in +the ``DeadlineAlert`` are passed to the callback function. + +.. code-block:: python + + # Place this method in `/files/plugins/deadline_callbacks.py` + async def custom_callback(**kwargs): + """Handle deadline violation with custom logic.""" + print(f"Deadline exceeded for DAG {kwargs.get("dag_id")}!") + print(f"Alert type: {kwargs.get("alert_type")}") + # Additional custom handling here + + + # Place this in a dag file + from datetime import timedelta + + from deadline_callbacks import custom_callback + + from airflow import DAG + from airflow.providers.standard.operators.empty import EmptyOperator + from airflow.sdk.definitions.deadline import DeadlineAlert, DeadlineReference + + with DAG( + dag_id="custom_deadline_alert", + deadline=DeadlineAlert( + reference=DeadlineReference.DAGRUN_QUEUED_AT, + interval=timedelta(minutes=15), + callback=custom_callback, + callback_kwargs={"alert_type": "time_exceeded", "dag_id": "custom_deadline_alert"}, + ), + ): + EmptyOperator(task_id="example_task") + +.. note:: + Regarding Deadline callbacks: + + * Async callbacks are recommended as they will be executed by the Triggerer. + * Users must ensure any async callback is importable by the Triggerer. + * One easy way to do this is to place the callback as a top-level method in a new file in the plugins folder. + * The Triggerer may need to be restarted when a callback is added or changed in order to reload the files. + + +Deadline Calculation Review Comment: This block is really nice to hit home the notion of how this all works. Maybe even pull it closer to the top. ########## airflow-core/docs/howto/deadline-alerts.rst: ########## @@ -0,0 +1,263 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + + +Deadline Alerts +=============== + +The :class:`~airflow.sdk.definitions.deadline.DeadlineAlert` feature is the next evolution of +the old SLA. Deadline Alerts allow you to set time thresholds for your DAG runs and automatically +respond when those thresholds are exceeded. You can set up Deadline Alerts by choosing a built-in +reference point, setting an interval, and defining a response using either Airflow's Notifiers or +a custom callback function. + +Creating a Deadline Alert +------------------------- + +To create a Deadline Alert, there are three three components you must specify and one optional one: + +* Reference: When to start counting from +* Interval: How far before or after the reference point to trigger the alert +* Callback: What to do when the deadline is exceeded +* Callback Kwargs: Optional values to pass to the Callback when it is run + +Here is how Deadlines are calculated: + +:: + + [Reference] ------ [Interval] ------> [Deadline] + ^ ^ + | | + Start time Trigger point + +Below is an example DAG implementation. If the DAG has not finished 15 minutes after it was queued, send an email: + +.. code-block:: python + + from datetime import datetime, timedelta + from airflow import DAG + from airflow.sdk.definitions.deadline import DeadlineAlert, DeadlineReference + from airflow.providers.smtp.notifications.smtp import SmtpNotifier + from airflow.providers.standard.operators.empty import EmptyOperator + + with DAG( + dag_id="deadline_alert_example", + deadline=DeadlineAlert( + reference=DeadlineReference.DAGRUN_QUEUED_AT, + interval=timedelta(minutes=15), + callback=SmtpNotifier( + to="[email protected]", + subject="[Alert] DAG {{ dag.dag_id }} exceeded time threshold", + html_content="The DAG has been running for more than 15 minutes since being queued.", + ), + ), + ): + EmptyOperator(task_id="example_task") + +The timeline for this example would look like this: + +:: + + |------|-----------|---------|-----------|--------| + Scheduled Queued Started Deadline + 00:00 00:03 00:05 00:18 + +Using Built-in References +------------------------- + +Airflow provides several built-in reference points that you can use with DeadlineAlert: + +``DeadlineReference.DAGRUN_QUEUED_AT`` + Measures time from when the DagRun was queued. Useful for monitoring resource constraints. + +``DeadlineReference.DAGRUN_LOGICAL_DATE`` + References when the DAG run was scheduled to start. For example, setting an interval of + ``timedelta(minutes=15)`` would trigger the alert if the DAG hasn't completed 15 minutes + after it was scheduled to start, regardless of when (or if) it actually began executing. + Useful for ensuring scheduled DAGs complete before their next scheduled run. + +``DeadlineReference.FIXED_DATETIME`` + Specifies a fixed point in time. Useful when DAGs must complete by a specific time. + +Here's an example using a fixed datetime: + +.. code-block:: python + + tomorrow_at_ten = datetime.combine(datetime.now().date() + timedelta(days=1), time(10, 0)) + + with DAG( + dag_id="fixed_deadline_alert", + deadline=DeadlineAlert( + reference=DeadlineReference.FIXED_DATETIME(tomorrow_at_ten), + interval=timedelta(minutes=-30), # Alert 30 minutes before the reference. + callback=SmtpNotifier( + to="[email protected]", + subject="Report will be late", + html_content="The report will not be ready 30 minutes before the deadline.", + ), + ), + ): + EmptyOperator(task_id="example_task") + +The timeline for this example would look like this: + +:: + + |------|----------|---------|------------|--------| + Queued Start Deadline Reference + 09:15 09:17 09:30 10:00 + +.. note:: + Note that since the interval is a negative value, the deadline is before the reference in this case. + +Using Callbacks +--------------- + +When a deadline is exceeded, the callback is executed. You can use any async :doc:`Notifier </howto/notifications>` +or create a custom callback function. + +Using Built-in Notifiers +^^^^^^^^^^^^^^^^^^^^^^^^ + +Here's an example using the Slack notifier if the DagRun has not finished within 30 minutes of it being queued: + +.. code-block:: python + + with DAG( + dag_id="slack_deadline_alert", + deadline=DeadlineAlert( + reference=DeadlineReference.DAGRUN_QUEUED_AT, + interval=timedelta(minutes=30), + callback=SlackNotifier( + slack_conn_id="slack_default", + channel="#alerts", + text="DAG {{ dag.dag_id }} has been running for more than 30 minutes since being queued.", + username="Airflow Alerts", + ), + ), + ): + EmptyOperator(task_id="example_task") + +Creating Custom Callbacks +^^^^^^^^^^^^^^^^^^^^^^^^^ + +You can create custom callbacks for more complex handling. The ``callback_kwargs`` specified in +the ``DeadlineAlert`` are passed to the callback function. Review Comment: This and the code snippets need updating after https://github.com/apache/airflow/pull/53951#pullrequestreview-3080641355 right? ########## airflow-core/docs/core-concepts/dags.rst: ########## @@ -829,3 +829,40 @@ if it fails for ``N`` number of times consecutively. we can also provide and override these configuration from DAG argument: - ``max_consecutive_failed_dag_runs``: Overrides :ref:`config:core__max_consecutive_failed_dag_runs_per_dag`. + +Deadline Alerts +--------------- + +.. versionadded:: 3.1 + +Deadline Alerts allow you to set time thresholds for your DAG runs and automatically respond when those +thresholds are exceeded. You can set deadlines relative to a fixed datetime, use one of the available +calculated references (like DAG queue time or start time), or implement your own custom reference. +When a deadline is exceeded, it triggers a callback which can notify you or take other actions. + +Here's a simple example using the existing email Notifier: Review Comment: I like capitalizing the proper noun here, FWIW -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
