abhishekrb19 commented on code in PR #18598:
URL: https://github.com/apache/druid/pull/18598#discussion_r2411358991


##########
extensions-contrib/prometheus-emitter/src/main/java/org/apache/druid/emitter/prometheus/PrometheusEmitter.java:
##########
@@ -256,4 +279,21 @@ public void setPushGateway(PushGateway pushGateway)
   {
     this.pushGateway = pushGateway;
   }
+
+  private void cleanUpStaleMetrics()
+  {
+    if (config.getFlushPeriod() == null) {
+      return;
+    }
+
+    Map<String, DimensionsAndCollector> map = metrics.getRegisteredMetrics();
+    for (Map.Entry<String, DimensionsAndCollector> entry : map.entrySet()) {
+      if (entry.getValue().isExpired(config.getFlushPeriod())) {
+        log.debug("Metric [%s] has expired (last updated %d ms ago)",
+                 entry.getKey(),
+                 entry.getValue().getTimeSinceLastUpdate());
+        entry.getValue().getCollector().clear();

Review Comment:
   >Hmm, I suppose removing the children for specific labels would be a more 
complete solution.
   But it could blow up the memory usage pretty fast since we would now be 
tracking the last updated time for every unique set of label values for every 
metric.
   
   
   Yeah, good callout on the memory usage when tracking every unique 
combination of label sets per metric. Prometheus isn’t designed for very [high 
cardinality 
labels](https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels),
 and the Prometheus `SimpleCollector` in the java library already tracks the 
unique label sets (`children`) internally. On the extension side, we’d only be 
dealing with a subset of those `children`, since we’d remove the ephemeral 
“stale” entries anyway (if there's a way to leverage the collector’s internal 
state somehow, that would be even better). So I’m not too concerned about the 
additional memory this would require, though it might still be worth validating 
in a cluster with some labels with reasonable cardinality and load.
   
   I’m also okay with the suggestion to start without labels for now and 
revisit this (or another solution) once we have a better understanding of the 
additional memory overhead it might introduce. The current implementation 
should remain extensible enough to support label tracking in the future if 
needed. We can keep [#14638](https://github.com/apache/druid/issues/14638)
    open and call out in the documentation that the TTL expiration only applies 
to certain metric types and not to individual timeseries'. 
   @aho135 @kfaraz what do you think?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to