abhishekrb19 commented on code in PR #14292:
URL: https://github.com/apache/druid/pull/14292#discussion_r1198412627


##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java:
##########
@@ -4220,6 +4220,18 @@ protected void emitLag()
           return;
         }
 
+        // Try emitting lag even with stale metrics provided that none of the 
partitions has negative lag
+        final boolean areOffsetsStale =
+            sequenceLastUpdated != null
+            && sequenceLastUpdated.getMillis()
+               < System.currentTimeMillis() - 
tuningConfig.getOffsetFetchPeriod().getMillis();
+        if (areOffsetsStale && partitionLags.values().stream().anyMatch(x -> x 
< 0)) {
+          log.warn("Lag is negative and will not be emitted because topic 
offsets have become stale. "

Review Comment:
   For troubleshooting, I think it'll also be good to log the topic:partition 
info where the offsets may potentially be  stale



##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java:
##########
@@ -4220,6 +4220,18 @@ protected void emitLag()
           return;
         }
 
+        // Try emitting lag even with stale metrics provided that none of the 
partitions has negative lag

Review Comment:
   ```suggestion
           // Try emitting lag even with stale metrics provided that none of 
the partitions have negative lag
   ```



##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java:
##########
@@ -4220,6 +4220,18 @@ protected void emitLag()
           return;
         }
 
+        // Try emitting lag even with stale metrics provided that none of the 
partitions has negative lag
+        final boolean areOffsetsStale =
+            sequenceLastUpdated != null
+            && sequenceLastUpdated.getMillis()
+               < System.currentTimeMillis() - 
tuningConfig.getOffsetFetchPeriod().getMillis();
+        if (areOffsetsStale && partitionLags.values().stream().anyMatch(x -> x 
< 0)) {
+          log.warn("Lag is negative and will not be emitted because topic 
offsets have become stale. "
+                   + "This will not impact data processing. "
+                   + "Offsets may become stale because of connectivity 
issues.");
+          return;

Review Comment:
   Should we skip emitting lag metrics only for the stale partitions? I think 
in general, it'll be helpful to emit metrics for partitions that have non-zero 
lag. For example, if a topic's partitions are spread across multiple brokers 
and only some have connectivity issues. Or for a topic where some partitions 
receive little to no data, those may selectively be considered "stale". 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to