1fanwang opened a new pull request, #66819:
URL: https://github.com/apache/airflow/pull/66819
### Problem
`DagRun.update_state()` already detects the task-deadlock case — when every
unfinished task is unrunnable — logs an error, and notifies the state-changed
listeners with `msg="all_tasks_deadlocked"`. It does not emit a Stats counter,
so operators who want to alert on deadlock-induced failures end up grepping
scheduler logs or scraping state-change notifications. There's no first-class
signal alongside `zombies.zombie_unfinished_run_failure_count` or the
executor-event failure counters.
### Fix
Add `stats.incr("dagrun.deadlocked", tags={"dag_id": self.dag_id,
"run_type": self.run_type})` at the existing log + notify call site in
`DagRun.update_state`, and register the new counter in the observability
metrics template.
### Tests
Added `test_dagrun_deadlock_emits_stats_counter` in
`airflow-core/tests/unit/models/test_dagrun.py`. Mirrors the existing
`test_dagrun_deadlock` fixture (invalid `trigger_rule` to force the deadlock
branch), mocks `stats.incr`, and asserts the call with the expected name and
tags.
Closes #66818
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]