Greg Neiheisel created AIRFLOW-3177:
---------------------------------------
Summary: Change scheduler_heartbeat metric from gauge to counter
Key: AIRFLOW-3177
URL: https://issues.apache.org/jira/browse/AIRFLOW-3177
Project: Apache Airflow
Issue Type: Improvement
Components: scheduler
Reporter: Greg Neiheisel
Assignee: Greg Neiheisel
Currently, the scheduler_heartbeat metric exposed with the statsd integration
is a gauge. I'm proposing to change the gauge to a counter for a better
integration with Prometheus via the
[statsd_exporter|[https://github.com/prometheus/statsd_exporter].]
Rather than pointing Airflow at an actual statsd server, you can point it at
this exporter, which will accumulate the metrics and expose them to be scraped
by Prometheus at /metrics. The problem is that once this value is set when the
scheduler runs its first loop, it will always be exposed to Prometheus as 1.
The scheduler can crash, or be turned off and the statsd exporter will report a
1 until it is restarted and rebuilds its internal state.
By turning this metric into a counter, we can detect an issue with the
scheduler by graphing and alerting using a rate. If the rate of change of the
counter drops below what it should be at (determined by the
scheduler_heartbeat_secs setting), we can fire an alert.
This should be helpful for adoption in Kubernetes environments where Prometheus
is pretty much the standard.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)