navina commented on code in PR #9977:
URL: https://github.com/apache/pinot/pull/9977#discussion_r1047540796


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -255,6 +255,11 @@ public void deleteSegmentFile() {
   private StreamPartitionMsgOffset _finalOffset; // Used when we want to catch 
up to this one
   private volatile boolean _shouldStop = false;
 
+  /** This variable will be set by the configured {@literal 
IngestionBasedConsumptionStatusChecker} when the segment is
+   * caughtup.
+   */
+  private volatile boolean _caughtUpWithUpstream = false;

Review Comment:
   Very good questions and I think we will encounter can increase in the lag 
during traffic spike. However, we will be monitoring on the "trend" of the lag, 
where it should always be trending down or remain stable. 
   
   The problem with pause/resume is that the segment or table data manager 
doesn't know if the consumer has been paused because it is simply treated as a 
force commit. I think we still want to have visibility when the stream is 
paused. But the other metric `LLC_PARTITION_CONSUMING` will not be true in this 
case. So, the monitoring rule should be something like:
   ALERT IF (`LLC_PARTITION_CONSUMING` == 1 && `availabilityLagMs` is trending 
up ) for a period of 15min 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to