wolfier opened a new issue, #45388:
URL: https://github.com/apache/airflow/issues/45388

   ### Apache Airflow version
   
   2.10.4
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   Two queued dagruns of a DAG with max_active_runs of 1 started within 0.2 
seconds of each other.
   
   The deployment has two schedulers, A and B. I suspect scheduler A started 
one dagrun and scheduler B started the other dagrun. Because a scheduler 
queries the active dagrun information every scheduling loop via 
[_start_queued_dagruns]( 
https://github.com/apache/airflow/blob/2.10.4/airflow/jobs/scheduler_job_runner.py#L1537),
 it is possible for the limit to be exceeded as the information is not shared 
between schedulers. Both schedulers thought there were no active dagruns and 
started their respective queued dagrun.
   
   The question is how does one dagrun end up in one query and not the other. 
One explanation could be scheduler A goes out and locks only 1 dagrun row 
because the other dagrun row is out of the 
[max_dagruns_per_loop_to_schedule](https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#max-dagruns-per-loop-to-schedule)
 range. The other scheduler then picked up the dagrun that did not get queried. 
Even though it is very unlikely, I suspect both scheduling loops ran the query 
very closely in time.
   
   ### What you think should happen instead?
   
   _No response_
   
   ### How to reproduce
   
   This scenario requires extreme luck (or lack thereof) so I have not been 
able to reproduce this behaviour.
   
   Perhaps the key to reproduce this is with max_dagruns_per_loop_to_schedule 
set to 1.
   
   ### Operating System
   
   n/a
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to