gianm commented on issue #17932:
URL: https://github.com/apache/druid/issues/17932#issuecomment-2820224875

   > when we work timeseries monitoring system like Prometheus, the old metric 
of `service/heartbeat` with `leader=0` would still be present with a value of 
1, and it doesn't get decremented
   
   Is there some way to get Prometheus to not do this, or to transform the 
metric somehow on the way into Prometheus? I'd rather not need to design our 
metrics with Prometheus in mind. Mostly they're designed with Druid itself in 
mind as the storage backend for its own metrics. With Druid as the backend, the 
current design is pretty natural, and you would do something like this to find 
double leader situations:
   
   ```sql
   SELECT
     FLOOR(__time TO MINUTE),
     COUNT(DISTINCT "host") num_servers,
     COUNT(DISTINCT "host") FILTER(WHERE "leader" = 1) num_leaders
   WHERE
     TIME_IN_INTERVAL(__time, '2000-01-01/P2D') -- interval you want to look at
     AND "metric" = 'service/heartbeat'
     AND "service" = 'druid/overlord'
   GROUP BY 1
   HAVING num_leaders > 1
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to