prince8273 opened a new pull request, #67077:
URL: https://github.com/apache/airflow/pull/67077
Currently the scheduler loop and heartbeat-timeout detection emit minimal
context, making production diagnosis of stalls, slot contention, and worker
crashes difficult.
### Changes
**`_do_scheduling()`**
- Captures `dag_runs_examined` after fetching active runs
- Emits `scheduler.dag_runs.examined` and `scheduler.executor.open_slots`
gauges
- Adds a structured summary log line before return
**`_purge_task_instances_without_heartbeats()`**
- Emits `scheduler.tasks.heartbeat_timeout` gauge
- Enriches the existing error log with `heartbeat_age_seconds`, `hostname`,
`pid`, and `task_running_seconds`
### Notes
- Additive only — no behavior change, no state transitions affected
- Follows existing `stats.gauge` naming and import conventions
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]