keith-turner commented on PR #3366:
URL: https://github.com/apache/accumulo/pull/3366#issuecomment-1531807723

   > Are there things that we should be checking for a tserver to validate that 
it meets some minimal level of healthy?
   
   We also have some sort of check for stuck compactions, I think it just logs 
something also.  Does it make sense to emit metrics for these things in 
addition a log message?  Maybe each server could have a unhealthy count metric. 
 The impl of the metric could survey known things and add one if they are 
currently unhealthy.  If a compaction is currently stuck that would contribute 
to the count when surveyed. If the metadata validation with memory is having 
problems that would contribute to the counter when surveyed.  If things are 
unhealthy and then healthy the survey could count nothing and emit zero.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to