abhishekrb19 commented on code in PR #14292:
URL: https://github.com/apache/druid/pull/14292#discussion_r1200876981


##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java:
##########
@@ -4220,6 +4220,18 @@ protected void emitLag()
           return;
         }
 
+        // Try emitting lag even with stale metrics provided that none of the 
partitions has negative lag
+        final boolean areOffsetsStale =
+            sequenceLastUpdated != null
+            && sequenceLastUpdated.getMillis()
+               < System.currentTimeMillis() - 
tuningConfig.getOffsetFetchPeriod().getMillis();
+        if (areOffsetsStale && partitionLags.values().stream().anyMatch(x -> x 
< 0)) {
+          log.warn("Lag is negative and will not be emitted because topic 
offsets have become stale. "
+                   + "This will not impact data processing. "
+                   + "Offsets may become stale because of connectivity 
issues.");
+          return;

Review Comment:
   Yeah, a per-partition lag metric would complement the existing metrics. My 
main concern with not reporting _any_ lag for a topic in this scenario is we'd 
have periods of missing lag data for as long as there's at least one stale 
partition in a topic. The missing metrics data can hide problems silently and 
affect alerting for downstream consumers of these metrics. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to