Re: IEP-14: Ignite failures handling (Discussion)

Dmitriy Setrakyan Mon, 12 Mar 2018 22:27:06 -0700

On Tue, Mar 13, 2018 at 1:18 AM, Andrey Kornev <[email protected]>
wrote:


> I believe the only reasonable way to handle a critical system failure (as
> it is defined in the IEP) is a JVM halt (not a graceful exit/shutdown!).
> The sooner - the better, lesser impact. There’s simply no way to reason
> about the state of the system in a situation like that, all bets are off.
> Any other policy would only confuse the matters and in all likelihood make
> things worse.
>
> In practice, SREs/Operations would very much rather have a process die a
> quick clean death, than let it run indefinitely and hope that it’ll somehow
> recover by itself at some point in future, potentially degrading the
> overall system stability and availability all the while.
>

Completely agree.

Re: IEP-14: Ignite failures handling (Discussion)

Reply via email to