kirklund commented on code in PR #7747:
URL: https://github.com/apache/geode/pull/7747#discussion_r894864266


##########
geode-core/src/main/java/org/apache/geode/internal/monitoring/executor/AbstractExecutor.java:
##########
@@ -51,9 +56,25 @@ protected AbstractExecutor(String groupName, long threadID) {
 
   public void handleExpiry(long stuckTime, Map<Long, ThreadInfo> 
threadInfoMap) {
     incNumIterationsStuck();
+    sendAlertForThreadStuckForLong(stuckTime, threadInfoMap);
     logger.warn(createThreadReport(stuckTime, threadInfoMap));
   }
 
+  private void sendAlertForThreadStuckForLong(long stuckTime, Map<Long, 
ThreadInfo> threadInfoMap) {
+    if (maxThreadStuckTime <= 0) {
+      return;
+    }
+    if (threadInfoMap.get(threadID) == null) {
+      return;
+    }
+    if (stuckForGood) {
+      return;
+    }
+    if (stuckTime > maxThreadStuckTime) {
+      stuckForGood = true;
+      logger.fatal(createThreadReport(stuckTime, threadInfoMap));

Review Comment:
   `fatal` should be reserved for logging about catastrophic failure that may 
result in data corruption. The server should probably be taken off-line or 
bounced.
   
   `error` should be used to indicate that something failed but there should be 
no data corruption and it should generally be ok to allow the server to 
continue running.
   
   I think you probably want this log statement to be at `error` level which 
will still generate an `alert`.
   
   I'd like to add @dschneider-pivotal as a reviewer to confirm the above. 
Darrel, if you want to give me good definitions, I'd be happy to add it to the 
Apache Geode Wiki.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to