ephraimbuddy opened a new pull request, #40696:
URL: https://github.com/apache/airflow/pull/40696

   TI.are_dependencies_met run over and over even when no changes have happened
   that would allow it to pass. This causes the scheduler loop to get slower and
   slower as more blocked TIs pile up.
   
   This scenario is easy to reproduce with this DAG (courtesy of @rob-1126):
   Before running it, enable debug logging
   
   ```
   from datetime import datetime
   
   from airflow import DAG
   from airflow.operators.bash_operator import BashOperator
   
   class FailsFirstTimeOperator(BashOperator):
       def execute(self, context):
           if context["ti"].try_number == 1:
               raise Exception("I fail the first time on purpose to test retry 
delay")
           print(context["ti"].try_number)
           return super().execute(context)
   
   one_day_of_seconds = 60 * 60 * 24
   with DAG(dag_id="waity", schedule_interval=None, start_date=datetime(2021, 
1, 1)):
       starting_task = FailsFirstTimeOperator(task_id="starting_task",
                                              retry_delay=one_day_of_seconds,
                                           retries=1, bash_command="echo whee")
       for i in range(0,1*1000):
           task = BashOperator(task_id=f"task_{i}", bash_command="sleep 1")
           starting_task >> task
   ```
   
   Simply run multiples of the above DAG (6 dagruns is enough to observe the 
delay).
   Note that the scheduler loop is now taking ~4-6 seconds, and grows with each 
new waity dagrun.
   
   This commit adds a new column(blocked_by_upstream) to the TaskInstance 
table. This column is updated anytime a task instance is blocked by an upstream
   taskinstance. This way, we prevent the repetitive dependencies check for the 
task instances
   
   closes: https://github.com/apache/airflow/pull/40293


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to