On 14/07/2014 17:01, Christopher Schultz wrote: > All, > > It seems that ContainerBackgroundProcessor can die and end up > silently destabilizing things.
It is only silent if the system admin has done something stupid like redirecting stderr to /dev/null. Frankly, if they do that they deserve everything they get. <snip/> > That means that even if the OOME was transient and the JVM could > recover from the failure, I'd argue that an OOME is never recoverable. Once it occurs you have no idea what allocations may have failed. Even if the thread where the error is reported can recover from the OOME, you have no way of determining if any other threads were affected or how they were affected. > the background processor thread is dead and things like sessions > will pile up until memory is truly exhausted. > > The problem is in the code for the runnable method. Somewhat > simplified, it's just a loop to do stuff: > > while(!done) { try { sleep(); processChildren(); } catch (Throwable > t) { ExceptionUtils.handleException(t); log("error", t); } } > > Although the stack trace doesn't show it, the above error clearly > occurred in processChildren(). > > ExceptionUtils.handleException checks for two things that I think > we might want to change: > > 1. If the exception is StackOverflowError, it silently ignores the > error and continues. I think we should at least log something, > probably at WARN level. No it doesn't. Go and read the code again. The exception is logged at ERROR level. > 2. If the exception is VirtualMachineError, it gets re-thrown with > no log. This skips the "log" call in the above code and so the only > log will come from the VM's "unhandled exception" logger which may > not go where you expect it to go. It goes to stderr unless they system admin has redirected it and if they have, they should know where to look for it. Further, they should also be monitoring it. > The exception propagates, and the thread's run() method finishes > (escapes due to uncaught exception). After that, regardless of the > recoverability of the situation (OOME), the background processor > will not run and therefore no auto-reload applications will > auto-reload, no sessions will ever die, etc. > > If we think that StackOverflowError is recoverable, why not > OutOfMemory? There is no assumption that StackOverflowError is recoverable. That is why it is logged at ERROR level. It is guaranteed that only that thread was affected so you know (from the stack trace) exactly what failed where and you can also determine how bad things are and opt to restart Tomcat if necessary. With respect to OOME how, exactly, do you propose to differentiate between a "recoverable" OOME and a non-recoverable one? > What about other VirtualMachineErrors? The current position is that they are all non-recoverable. If it can be demonstrated that more of them should be treated like StackOverflow then we can do so. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org