The ASF Sonar installation managed to generate 46GB of identical log messages [1] today in the 8 hours it took to notice it was down.
While better monitoring would/should have identified the problem sooner, it does demonstrate a problem with the acceptor threads in all three endpoints. If there is a system-level issue that causes the accept() call to always fail (such as hitting the ulimit) then the endpoint essentially enters a loop where it logs an error message for every iteration of the loop. This will result in many log messages per second. I'd like to do something about this. I was thinking of something along the lines of the following for each endpoint. Index: java/org/apache/tomcat/util/net/JIoEndpoint.java =================================================================== --- java/org/apache/tomcat/util/net/JIoEndpoint.java (revision 1072939) +++ java/org/apache/tomcat/util/net/JIoEndpoint.java (working copy) @@ -183,9 +183,19 @@ @Override public void run() { + int errorDelay = 0; + // Loop until we receive a shutdown command while (running) { + if (errorDelay > 0) { + try { + Thread.sleep(errorDelay); + } catch (InterruptedException e) { + // Ignore + } + } + // Loop if endpoint is paused while (paused && running) { try { @@ -225,9 +235,15 @@ // Ignore } } + errorDelay = 0; } catch (IOException x) { if (running) { log.error(sm.getString("endpoint.accept.fail"), x); + if (errorDelay == 0) { + errorDelay = 50; + } else if (errorDelay < 1600) { + errorDelay = errorDelay * 2; + } } } catch (NullPointerException npe) { if (running) { Thoughts / comments? Mark [1] http://pastebin.com/CrsujeW4 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org