dstandish commented on code in PR #59724:
URL: https://github.com/apache/airflow/pull/59724#discussion_r2645737644
##########
airflow-core/src/airflow/jobs/scheduler_job_runner.py:
##########
@@ -1815,9 +1816,10 @@ def _create_dag_runs(self, dag_models:
Collection[DagModel], session: Session) -
# we need to set DagModel.next_dagrun_info if the DagRun already
exists or if we
# create a new one. This is so that in the next scheduling loop we
try to create new runs
# instead of falling in a loop of IntegrityError.
Review Comment:
do you mean because i added the if condition here?
```
if created_dag_run:
self._update_next_dagrun_fields(
serdag=serdag,
dag_model=dag_model,
session=session,
active_non_backfill_runs=active_runs_of_dags[serdag.dag_id],
data_interval=data_interval,
)
```
> Would that cause a bug where a DagRun exists but DagModel.next_dagrun
wasn't updated (the edge case mentioned in the comment), the scheduler will get
stuck in an infinite loop attempting to create the same dag run repeatedly?
No this is not possible because the dag processor, every time it processes a
dag, it runs dag_model.calculate_dagrun_date_fields and if it needs to be
incremented, it will be.
Basically, if the scheduler is already trying to create the dagrun that
already exists, then something effed up, or something weird happed (like manual
trigger of a future logical date).
The most common scenario would be, I believe, that the scheduler
_decremented_ next_dagrun.
This can happen in the scheduler because suppose it creates 5 runs for
catchup, but then an older one finishes after a later one -- then the scheduler
may set the next_dagrun backward because of the way this code works.
**I think we should add logic so that the scheduler is never setting the
next_dagrun backward**. but i am not trying to change that here.
So ultimately, I don't think it makes much difference whether we
conditionally update based on created or not -- but it doesn't hurt to always
update in this context, so I removed the condition.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]