1fanwang opened a new issue, #66818:
URL: https://github.com/apache/airflow/issues/66818
### Apache Airflow version
main (development)
### What happened?
`DagRun.update_state()` already detects "task deadlock" — the
all-tasks-unfinished-but-none-schedulable branch in
`airflow-core/src/airflow/models/dagrun.py` (around line 1216):
```python
self.log.error("Task deadlock (no runnable tasks); marking run %s failed",
self)
self.set_state(DagRunState.FAILED)
self.notify_dagrun_state_changed(msg="all_tasks_deadlocked")
```
It logs + notifies state-changed, but doesn't emit a Stats counter.
Operators who want to alert on deadlock-induced failures end up grepping
scheduler logs or scraping state-change notifications.
### What you think should happen instead?
Emit `Stats.incr("dagrun.deadlocked", tags={"dag_id": ..., "run_type":
...})` at the same site, so the existing statsd / OTel pipeline picks it up
automatically.
### Use case / motivation
Track deadlock-induced failure rates as a first-class signal alongside
`zombies.zombie_unfinished_run_failure_count` and the executor-event failure
counters. Dashboards / alerts can then chart deadlock rate per DAG and run type
without log scraping.
### Proposal
One-line `Stats.incr(...)` next to the existing log + notify call. Test
mocks `Stats.incr` and asserts emission when the deadlock branch fires.
### Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's Code of Conduct
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]