The GitHub Actions job "Tests" on airflow.git has failed.
Run started by GitHub user ephraimbuddy (triggered by ephraimbuddy).

Head commit for run:
a566b7c6dfeac9fbecc2d9f47144098c1ee854c4 / Ephraim Anierobi 
<[email protected]>
Improve scheduler loop by reducing repetative `TI.are_dependencies_met`

TI.are_dependencies_met run over and over even when no changes have happened 
that would allow it to pass. This causes the scheduler loop to get slower and 
slower as more blocked TIs pile up.

This scenario is easy to reproduce with this DAG (courtesy of @rob-1126):
Before running it, enable debug logging

```python

from datetime import datetime

from airflow import DAG
from airflow.operators.bash_operator import BashOperator

class FailsFirstTimeOperator(BashOperator):
    def execute(self, context):
        if context["ti"].try_number == 1:
            raise Exception("I fail the first time on purpose to test retry 
delay")
        print(context["ti"].try_number)
        return super().execute(context)

one_day_of_seconds = 60 * 60 * 24
with DAG(dag_id="waity", schedule_interval=None, start_date=datetime(2021, 1, 
1)):
    starting_task = FailsFirstTimeOperator(task_id="starting_task", 
retry_delay=one_day_of_seconds, retries=1, bash_command="echo whee")
    for i in range(0,1*1000):
        task = BashOperator(task_id=f"task_{i}", bash_command="sleep 1")
        starting_task >> task

```

Simply run multiples of the above DAG (6 dagruns is enough to observe the 
delay). Note that the scheduler loop is now taking ~4-6 seconds, and grows with 
each new waity dagrun.

The solution was to change the last_scheduling_decision to next_schedulable 
date for the dagrun.
1. When the task instance enter up_for_retry state, we set the next_schedulable 
date for the dagrun to the next retry date of the ti. This way, we stop 
unnecessary dependency checks for other TIs blocked by this retry state.
2. In other cases, we check once if the dependencies of a TI are met, if not 
met, we nullify the next_schedulable date.
3. The next schedulable date is updated for a dagrun when any of its 
taskinstance's state is in the finished state.

Report URL: https://github.com/apache/airflow/actions/runs/9563565626

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to