navina commented on code in PR #9977:
URL: https://github.com/apache/pinot/pull/9977#discussion_r1047540796
##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -255,6 +255,11 @@ public void deleteSegmentFile() {
private StreamPartitionMsgOffset _finalOffset; // Used when we want to catch
up to this one
private volatile boolean _shouldStop = false;
+ /** This variable will be set by the configured {@literal
IngestionBasedConsumptionStatusChecker} when the segment is
+ * caughtup.
+ */
+ private volatile boolean _caughtUpWithUpstream = false;
Review Comment:
Very good questions and I think we will encounter can increase in the lag
during traffic spike. However, we will be monitoring on the "trend" of the lag,
where it should always be trending down or remain stable.
The problem with pause/resume is that the segment or table data manager
doesn't know if the consumer has been paused because it is simply treated as a
force commit. I think we still want to have visibility when the stream is
paused. But the other metric `LLC_PARTITION_CONSUMING` will not be true in this
case. So, the monitoring rule should be something like:
ALERT IF (`LLC_PARTITION_CONSUMING` == 1 && `availabilityLagMs` is trending
up ) for a period of 15min
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]