keith-turner commented on PR #3366: URL: https://github.com/apache/accumulo/pull/3366#issuecomment-1531807723
> Are there things that we should be checking for a tserver to validate that it meets some minimal level of healthy? We also have some sort of check for stuck compactions, I think it just logs something also. Does it make sense to emit metrics for these things in addition a log message? Maybe each server could have a unhealthy count metric. The impl of the metric could survey known things and add one if they are currently unhealthy. If a compaction is currently stuck that would contribute to the count when surveyed. If the metadata validation with memory is having problems that would contribute to the counter when surveyed. If things are unhealthy and then healthy the survey could count nothing and emit zero. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
