symious commented on pull request #3009:
URL: https://github.com/apache/ozone/pull/3009#issuecomment-1032153106


   @avijayanhwx Thanks for the review.
   For the lag, I think there are some scenarios:
   1. A spike, the lag should recover after the spike finishes.
   2. A slowly increasing lag, for example, recon consumes 20,000 records per 
10 min, om produce 20,500 seconds per 10 min. in this case, the lag will always 
exist, it's better for the user to increase the threshold, and we need to give 
some numeric clue to users for the increase.
   3. A fast increasing lag, in this case, when recon requests updates from OM 
with an outdated sequenceNumber, the exception of 
`SequenceNumberNotFoundException` would be thrown and recon will ask for a full 
snapshot request. Also In this case, I think we should keep the threshold low, 
in case the huge updates cause the OOM issue of OM.
   
   For the lag monitoring, an option would be to add a new field in `DBUpdate` 
to indicate the lag or the latestSequenceNumber so that recon can use this 
information to notify the users. Those changes might be not that related to 
this ticket, I can create another ticket for this request if this option is ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to