[
https://issues.apache.org/jira/browse/AIRFLOW-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647515#comment-16647515
]
ASF GitHub Bot commented on AIRFLOW-3177:
-----------------------------------------
Fokko closed pull request #4027: [AIRFLOW-3177] Change scheduler_heartbeat from
gauge to counter
URL: https://github.com/apache/incubator-airflow/pull/4027
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git a/UPDATING.md b/UPDATING.md
index 74337f3fe8..5e1402576b 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -52,6 +52,10 @@ To delete a user:
airflow users --delete --username jondoe
```
+### StatsD Metrics
+
+The `scheduler_heartbeat` metric has been changed from a gauge to a counter.
Each loop of the scheduler will increment the counter by 1. This provides a
higher degree of visibility and allows for better integration with Prometheus
using the [StatsD Exporter](https://github.com/prometheus/statsd_exporter).
Scheduler upness can be determined by graphing and alerting using a rate. If
the scheduler goes down, the rate will drop to 0.
+
### Custom auth backends interface change
We have updated the version of flask-login we depend upon, and as a result any
diff --git a/airflow/jobs.py b/airflow/jobs.py
index b224f75545..3922939a86 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -1895,7 +1895,7 @@ def process_file(self, file_path, pickle_dags=False,
session=None):
@provide_session
def heartbeat_callback(self, session=None):
- Stats.gauge('scheduler_heartbeat', 1, 1)
+ Stats.incr('scheduler_heartbeat', 1, 1)
class BackfillJob(BaseJob):
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Change scheduler_heartbeat metric from gauge to counter
> -------------------------------------------------------
>
> Key: AIRFLOW-3177
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3177
> Project: Apache Airflow
> Issue Type: Improvement
> Components: scheduler
> Affects Versions: 2.0.0
> Reporter: Greg Neiheisel
> Assignee: Greg Neiheisel
> Priority: Minor
> Fix For: 1.10.1
>
>
> Currently, the scheduler_heartbeat metric exposed with the statsd integration
> is a gauge. I'm proposing to change the gauge to a counter for a better
> integration with Prometheus via the
> [statsd_exporter|[https://github.com/prometheus/statsd_exporter].]
> Rather than pointing Airflow at an actual statsd server, you can point it at
> this exporter, which will accumulate the metrics and expose them to be
> scraped by Prometheus at /metrics. The problem is that once this value is set
> when the scheduler runs its first loop, it will always be exposed to
> Prometheus as 1. The scheduler can crash, or be turned off and the statsd
> exporter will report a 1 until it is restarted and rebuilds its internal
> state.
> By turning this metric into a counter, we can detect an issue with the
> scheduler by graphing and alerting using a rate. If the rate of change of the
> counter drops below what it should be at (determined by the
> scheduler_heartbeat_secs setting), we can fire an alert.
> This should be helpful for adoption in Kubernetes environments where
> Prometheus is pretty much the standard.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)