kaxil commented on code in PR #62561:
URL: https://github.com/apache/airflow/pull/62561#discussion_r2880363898
##########
airflow-core/src/airflow/jobs/scheduler_job_runner.py:
##########
@@ -1981,6 +1981,9 @@ def _mark_backfills_complete(self, session: Session =
NEW_SESSION) -> None:
# todo: AIP-78 simplify this function to an update statement
query = select(Backfill).where(
Backfill.completed_at.is_(None),
+ # Guard: backfill must have at least one association,
+ # otherwise it is still being set up (see #61375).
+ exists(select(BackfillDagRun.id).where(BackfillDagRun.backfill_id
== Backfill.id)),
Review Comment:
Should we fix the root cause instead? `_create_backfill()` does
`session.commit()` ([backfill.py
L605](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/backfill.py#L605))
to persist the `Backfill` row, then creates `BackfillDagRun`/`DagRun` rows
afterwards — that's what opens the race window. Changing that to
`session.flush()` would still assign `br.id` (needed as FK for
`BackfillDagRun`) without committing. The `create_session()` context manager
already commits on successful exit, so all rows would be committed atomically —
eliminating the race window entirely.
If the guard approach is preferred, there's an edge case worth considering:
if `_create_backfill` fails *after* committing the `Backfill` row but *before*
creating any `BackfillDagRun` rows (e.g. `RuntimeError("No runs to create...")`
on
[L616](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/backfill.py#L616)),
this guard means `_mark_backfills_complete` will never clean it up. Combined
with the [`AlreadyRunningBackfill`
check](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/backfill.py#L585-L589),
that orphaned backfill would block all future backfills for the same DAG
permanently.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]