1fanwang opened a new pull request, #66810:
URL: https://github.com/apache/airflow/pull/66810

   ### Problem
   
   `scheduler_job_runner.py` emits gauges for pool slot states 
(`pool.open_slots`, `pool.queued_slots`, `pool.running_slots`, 
`pool.starving_tasks`). On most backends a gauge is last-write-wins between 
scrapes — a pool-pressure spike that lands between two scheduler iterations 
shows up as a single value, and the distribution between scrapes is lost. 
There's no way to compute p50/p95/p99 of pool utilization from gauges alone.
   
   ### Fix
   
   Emit a matching `stats.timing` alongside each existing `stats.gauge` at the 
same call site, with the same value and same `pool_name` tag. The new metric 
name is the gauge name plus a `.distribution` suffix:
   
   - `pool.open_slots.distribution`
   - `pool.queued_slots.distribution`
   - `pool.running_slots.distribution`
   - `pool.starving_tasks.distribution`
   
   `stats.timing` is the histogram-shape primitive in the existing stats API 
(statsd treats it as a histogram on the wire, mapped to a `timer` in the 
metrics YAML registry). The gauges are unchanged so existing scrapers keep 
working.
   
   The four new metric names are registered in 
`shared/observability/src/airflow_shared/observability/metrics/metrics_template.yaml`
 with `type: timer` and matching `legacy_name` entries, so the prek 
registry-sync check passes and the metrics rst docs pick them up automatically.
   
   ### Tests
   
   `test_emit_pool_metrics_emits_gauge_and_histogram` mocks the stats backend, 
invokes `_emit_pool_metrics` with a freshly-inserted pool, and asserts that for 
each of `pool.open_slots`, `pool.queued_slots`, `pool.running_slots`, the gauge 
and histogram calls fire with the same value and tags. The existing 
`test_emit_pool_starving_tasks_metrics` is unchanged and still passes (the new 
histogram call sits below the existing gauge call in the same loop).
   
   Closes #66800
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to