Greg Neiheisel created AIRFLOW-3177:
---------------------------------------

             Summary: Change scheduler_heartbeat metric from gauge to counter
                 Key: AIRFLOW-3177
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3177
             Project: Apache Airflow
          Issue Type: Improvement
          Components: scheduler
            Reporter: Greg Neiheisel
            Assignee: Greg Neiheisel


Currently, the scheduler_heartbeat metric exposed with the statsd integration 
is a gauge. I'm proposing to change the gauge to a counter for a better 
integration with Prometheus via the 
[statsd_exporter|[https://github.com/prometheus/statsd_exporter].]

Rather than pointing Airflow at an actual statsd server, you can point it at 
this exporter, which will accumulate the metrics and expose them to be scraped 
by Prometheus at /metrics. The problem is that once this value is set when the 
scheduler runs its first loop, it will always be exposed to Prometheus as 1. 
The scheduler can crash, or be turned off and the statsd exporter will report a 
1 until it is restarted and rebuilds its internal state.

By turning this metric into a counter, we can detect an issue with the 
scheduler by graphing and alerting using a rate. If the rate of change of the 
counter drops below what it should be at (determined by the 
scheduler_heartbeat_secs setting), we can fire an alert.

This should be helpful for adoption in Kubernetes environments where Prometheus 
is pretty much the standard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to