Jorricks commented on issue #25615:
URL: https://github.com/apache/airflow/issues/25615#issuecomment-1208992911

   I did some initial exploration of what could be causing this issue.
   I suppose the issue is in `_start_queued_dagruns` shown 
[here](https://github.com/apache/airflow/blob/2.2.3/airflow/jobs/scheduler_job.py#L935).
   I checked quite thoroughly on the order and limit query denoted by `dag_runs 
= self._get_next_dagruns_to_examine(State.QUEUED, session)`.
   I feel like this is correctly set as `last_scheduling_decision` should be 
None. Unfortunately, we are still planning our upgrade to Airflow 2.2.3 so I 
have not been able to verify whether `last_scheduling_decision` is in fact None.
   
   Then, the only reason I have been able to come up with this far is that when 
there are multiple schedulers in this loop, it could cause issues. Let me 
reason about this:
   Imagine we have a DAG called `my_dag` that has a max of 16 running DagRuns.
   1. Scheduler A enters this loop and tries to schedule Queued DagRuns to 
running for `my_dag`. At this point in time (T), the number of active Runs is 
15, which equals one less than the limit. Scheduler A will schedule one extra 
run.
   2. While scheduler A is still in its loop, at time (T+1) a DagRun has been 
marked Success by Scheduler B.
   3. Scheduler C enters this loop and tries to schedule Queued DagRuns to 
running for `my_dag`. Now it's time (T+2), at this point Scheduler C is also 
allowed to schedule a task, however, scheduler A locked all the earliest 
DagRuns, so now scheduler C resorts to way newer DagRuns. This could 
potentially lead to scheduling a task that is way later than the DagRun that 
was up next, after the one Scheduler A was scheduling.
   4. Scheduler A completes its loop and unlocks the rows.
   5. Scheduler C completes its loop and unlocks the rows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to