[GitHub] [airflow] leonsmith edited a comment on issue #18501: Scheduler overloaded when backfilling by clearing DAG history

GitBox Mon, 04 Oct 2021 03:58:37 -0700


leonsmith edited a comment on issue #18501:
URL: https://github.com/apache/airflow/issues/18501#issuecomment-933369645



   I think the fix here is to use a window function over 
[next_dagruns_to_examine](https://github.com/apache/airflow/blob/main/airflow/models/dagrun.py#L211)
 so it only returns dagruns up to the dags maximum concurrency.
   
   I know this is the core loop so performance is a concern here. Is there any 
objection to something like the following approach before I put effort in 
writing some tests?
   
   Are there any concerns for the other backends with this type of approach
   
   ```python
           if state == State.QUEUED:
               # For dag runs in the queued state, we check if they have 
reached the max_active_runs limit
               # and if so we drop them
               row_number_column = 
func.row_number().over(partition_by=DagRun.dag_id).label('row_number')
               query = query.add_columns(row_number_column)
   
               running_drs = (
                   session.query(DagRun.dag_id, 
func.count(DagRun.state).label('num_running'))
                   .filter(DagRun.state == DagRunState.RUNNING)
                   .group_by(DagRun.dag_id)
                   .subquery()
               )
               query = query.outerjoin(running_drs, running_drs.c.dag_id == 
DagRun.dag_id)
   
               open_dagrun_slots = DagModel.max_active_runs - 
func.coalesce(running_drs.c.num_running, 0)
               query = query.filter(row_number_column < open_dagrun_slots)
   
           query = query.order_by(
               nulls_first(cls.last_scheduling_decision, session=session),
               cls.execution_date,
           )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] leonsmith edited a comment on issue #18501: Scheduler overloaded when backfilling by clearing DAG history

Reply via email to