Re: ContainerBackgroundProcessor and compounding OOMEs

Mark Thomas Mon, 14 Jul 2014 09:18:57 -0700

On 14/07/2014 17:01, Christopher Schultz wrote:
> All,
> 
> It seems that ContainerBackgroundProcessor can die and end up
> silently destabilizing things.


It is only silent if the system admin has done something stupid like
redirecting stderr to /dev/null. Frankly, if they do that they deserve
everything they get.

<snip/>

> That means that even if the OOME was transient and the JVM could
> recover from the failure,

I'd argue that an OOME is never recoverable. Once it occurs you have
no idea what allocations may have failed. Even if the thread where the
error is reported can recover from the OOME, you have no way of
determining if any other threads were affected or how they were affected.

> the background processor thread is dead and things like sessions
> will pile up until memory is truly exhausted.
> 
> The problem is in the code for the runnable method. Somewhat
> simplified, it's just a loop to do stuff:
> 
> while(!done) { try { sleep(); processChildren(); } catch (Throwable
> t) { ExceptionUtils.handleException(t); log("error", t); } }
> 
> Although the stack trace doesn't show it, the above error clearly 
> occurred in processChildren().
> 
> ExceptionUtils.handleException checks for two things that I think
> we might want to change:
> 
> 1. If the exception is StackOverflowError, it silently ignores the
> error and continues. I think we should at least log something,
> probably at WARN level.

No it doesn't. Go and read the code again. The exception is logged at
ERROR level.

> 2. If the exception is VirtualMachineError, it gets re-thrown with
> no log. This skips the "log" call in the above code and so the only
> log will come from the VM's "unhandled exception" logger which may
> not go where you expect it to go.

It goes to stderr unless they system admin has redirected it and if
they have, they should know where to look for it. Further, they should
also be monitoring it.

> The exception propagates, and the thread's run() method finishes
> (escapes due to uncaught exception). After that, regardless of the
> recoverability of the situation (OOME), the background processor
> will not run and therefore no auto-reload applications will 
> auto-reload, no sessions will ever die, etc.
> 
> If we think that StackOverflowError is recoverable, why not
> OutOfMemory?

There is no assumption that StackOverflowError is recoverable. That is
why it is logged at ERROR level. It is guaranteed that only that
thread was affected so you know (from the stack trace) exactly what
failed where and you can also determine how bad things are and opt to
restart Tomcat if necessary.

With respect to OOME how, exactly, do you propose to differentiate
between a "recoverable" OOME and a non-recoverable one?

> What about other VirtualMachineErrors?

The current position is that they are all non-recoverable. If it can
be demonstrated that more of them should be treated like StackOverflow
then we can do so.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: ContainerBackgroundProcessor and compounding OOMEs

Reply via email to