abhishekrb19 commented on code in PR #14292:
URL: https://github.com/apache/druid/pull/14292#discussion_r1200876981
##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java:
##########
@@ -4220,6 +4220,18 @@ protected void emitLag()
return;
}
+ // Try emitting lag even with stale metrics provided that none of the
partitions has negative lag
+ final boolean areOffsetsStale =
+ sequenceLastUpdated != null
+ && sequenceLastUpdated.getMillis()
+ < System.currentTimeMillis() -
tuningConfig.getOffsetFetchPeriod().getMillis();
+ if (areOffsetsStale && partitionLags.values().stream().anyMatch(x -> x
< 0)) {
+ log.warn("Lag is negative and will not be emitted because topic
offsets have become stale. "
+ + "This will not impact data processing. "
+ + "Offsets may become stale because of connectivity
issues.");
+ return;
Review Comment:
Yeah, a per-partition lag metric would complement the existing metrics. My
main concern with not reporting _any_ lag for a topic in this scenario is we'd
have periods of missing lag data for as long as there's at least one stale
partition in a topic. The missing metrics data can hide problems silently and
affect alerting for downstream consumers of these metrics. What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]