kirklund commented on code in PR #7747:
URL: https://github.com/apache/geode/pull/7747#discussion_r894864266
##########
geode-core/src/main/java/org/apache/geode/internal/monitoring/executor/AbstractExecutor.java:
##########
@@ -51,9 +56,25 @@ protected AbstractExecutor(String groupName, long threadID) {
public void handleExpiry(long stuckTime, Map<Long, ThreadInfo>
threadInfoMap) {
incNumIterationsStuck();
+ sendAlertForThreadStuckForLong(stuckTime, threadInfoMap);
logger.warn(createThreadReport(stuckTime, threadInfoMap));
}
+ private void sendAlertForThreadStuckForLong(long stuckTime, Map<Long,
ThreadInfo> threadInfoMap) {
+ if (maxThreadStuckTime <= 0) {
+ return;
+ }
+ if (threadInfoMap.get(threadID) == null) {
+ return;
+ }
+ if (stuckForGood) {
+ return;
+ }
+ if (stuckTime > maxThreadStuckTime) {
+ stuckForGood = true;
+ logger.fatal(createThreadReport(stuckTime, threadInfoMap));
Review Comment:
`fatal` should be reserved for logging about catastrophic failure that may
result in data corruption. The server should probably be taken off-line or
bounced.
`error` should be used to indicate that something failed but there should be
no data corruption and it should generally be ok to allow the server to
continue running.
I think you probably want this log statement to be at `error` level which
will still generate an `alert`.
I'd like to add @dschneider-pivotal as a reviewer to confirm the above.
Darrel, if you want to give me good definitions, I'd be happy to add it to the
Apache Geode Wiki.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]