On Tue, Mar 13, 2018 at 1:18 AM, Andrey Kornev <andrewkor...@hotmail.com> wrote:
> I believe the only reasonable way to handle a critical system failure (as > it is defined in the IEP) is a JVM halt (not a graceful exit/shutdown!). > The sooner - the better, lesser impact. There’s simply no way to reason > about the state of the system in a situation like that, all bets are off. > Any other policy would only confuse the matters and in all likelihood make > things worse. > > In practice, SREs/Operations would very much rather have a process die a > quick clean death, than let it run indefinitely and hope that it’ll somehow > recover by itself at some point in future, potentially degrading the > overall system stability and availability all the while. > Completely agree.