AmatyaAvadhanula commented on code in PR #13144:
URL: https://github.com/apache/druid/pull/13144#discussion_r982280416
##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java:
##########
@@ -2502,11 +2538,48 @@ private boolean updatePartitionDataFromStream()
return true;
}
+ public long computeTotalLag()
+ {
+ if (isIdle()) {
+ Map<PartitionIdType, SequenceOffsetType> oldOffsets =
getOffsetsFromMetadataStorage();
+ return computeLagStatsForOffsets(oldOffsets).getTotalLag();
Review Comment:
> Supervisor is idle: current offsets are fetched from metadata storage
since no tasks are likely running.
Couldn't this lead to a situation where all actively running tasks could
return 0 lag as they have caught up, but one or more of them may not have
checkpointed to the metadata store?
This will lead to a situation where an IDLE supervisor returns to RUNNING
due to the discrepancy when computing lag using only the metadata store.
I think that for each partition in the topic, the current offset must be
retrieved from the active task it has been assigned to, with a fallback to the
metadata store.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]