Arunodoy18 opened a new pull request, #60761:
URL: https://github.com/apache/airflow/pull/60761
Closes #60559
Summary
Airflow currently allows multiple DAG files to define the same dag_id and
silently keeps only the last parsed DAG, overwriting the previous one without
any warning. Since DAG parse order is nondeterministic (especially in
distributed environments like Composer/GCS), this can lead to unpredictable
behavior across scheduler runs.
This PR introduces a minimal, backward-compatible safeguard by detecting
duplicate dag_id collisions during DAGBag parsing and emitting a clear warning
in the scheduler logs.
What changed
- Detect when a DAG with an already existing dag_id is being added to the
DagBag.
- Log a warning showing:
• the dag_id
• original DAG file location
• new DAG file location
- Preserve existing behavior (no parsing failure or UI change).
Why this is safe
This mirrors Airflow’s existing duplicate task_id handling pattern (warn
instead of fail) and improves visibility without breaking backward
compatibility.
Scope
Only DAGBag parsing logic is updated. No changes to scheduling,
serialization, or UI.
Behavior after fix
Users will now see an explicit scheduler warning whenever duplicate dag_id
definitions are encountered, preventing silent nondeterministic overrides.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]