I believe the only reasonable way to handle a critical system failure (as it is 
defined in the IEP) is a JVM halt (not a graceful exit/shutdown!). The sooner - 
the better, lesser impact. There’s simply no way to reason about the state of 
the system in a situation like that, all bets are off. Any other policy would 
only confuse the matters and in all likelihood make things worse.

In practice, SREs/Operations would very much rather have a process die a quick 
clean death, than let it run indefinitely and hope that it’ll somehow recover 
by itself at some point in future, potentially degrading the overall system 
stability and availability all the while.

Andrey
_____________________________
From: Dmitriy Setrakyan <dsetrak...@apache.org>
Sent: Monday, March 12, 2018 5:23 PM
Subject: Re: IEP-14: Ignite failures handling (Discussion)
To: <dev@ignite.apache.org>


On Mon, Mar 12, 2018 at 5:12 PM, Denis Magda <dma...@apache.org> wrote:

> Dmitriy,
>
> Ignite client node is usually used in the embedded mode. By killing the
> whole process, the node is running in, we're going to kill the entire
> application. That doesn't sound like a good plan. That's why my suggestion
> is to try to kill the node somehow instead rather than the whole process.
>

Agree. However, if the node cannot stop gracefully, we should kill the
process anyway. This should be the default behavior. User should be able to
turn it off as needed.


>
> As for the server nodes, which usually own the whole process, it's totally
> fine to kill the process right away.
>

Well, even here I would still try to gracefully stop the node first. If
that cannot be done, then we should kill the process.


>
> --
> Denis
>
> On Mon, Mar 12, 2018 at 4:12 PM, Dmitriy Setrakyan <dsetrak...@apache.org>
> wrote:
>
> > Denis, what is the difference between killing the process and killing the
> > node and the process?
> >
> > D.
> >
> > On Mon, Mar 12, 2018 at 12:03 PM, Denis Magda <dma...@apache.org> wrote:
> >
> > > Guys,
> > >
> > > I would make a decision depending on a type of the problematic node:
> > >
> > > - If it's a *server node*, then let's kill the process simply
> because
> > > the node usually owns the whole process. Don't see a practical
> reason
> > > why a
> > > user wants to run 2 server nodes in a single process.
> > > - If it's a *client node*, then the best approach is to kill the
> node
> > > and not the process.
> > >
> > > --
> > > Denis
> > >
> > > On Mon, Mar 12, 2018 at 3:04 AM, Dmitry Pavlov <dpavlov....@gmail.com>
> > > wrote:
> > >
> > > > Hi Andrey, Igniters,
> > > >
> > > > Thank you for starting this topic, because this is really important
> > > > decision.
> > > >
> > > > JVM termination in case Ignite is started within application server
> > with
> > > > other application will kill all services started.
> > > >
> > > > So I suggest this option is not default. We can add this option
> > > > (action="JVM termination") as pre-configured for ignite.sh/bat since
> > we
> > > > know is it separate JVM. But I do not vote for the option, if it was
> > the
> > > > default in code.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > пн, 12 мар. 2018 г. в 12:57, Andrey Kuznetsov <stku...@gmail.com>:
> > > >
> > > > > To my mind, the default action should be as severe as possible,
> since
> > > we
> > > > > deal with critical errors, that is, entire JVM termination. In the
> > case
> > > > of
> > > > > some custom setup (e.g. different cluster nodes in one JVM) failure
> > > > > response action should be configured explicitly.
> > > > >
> > > > > 2018-03-12 12:32 GMT+03:00 Andrey Gura <ag...@apache.org>:
> > > > >
> > > > > > Igniters!
> > > > > >
> > > > > > We are working on proposal described in IEP-14 Ignite failures
> > > > > > handling [1] and it's time to discuss it with community (although
> > it
> > > > > > was necessary to do this before).
> > > > > >
> > > > > > Most important question: what should be default behaviour in case
> > of
> > > > > > failure? There are 4 actions:
> > > > > >
> > > > > > 1. Restart JVM process (it's possible only if process was started
> > > from
> > > > > > ignite.(sh|bat) script)
> > > > > > 2. Terminate JVM;
> > > > > > 3. Stop node (if there is only one node in process then process
> > will
> > > > > > be also terminated);
> > > > > > 4. No operation.
> > > > > >
> > > > > > I believe that node should be stopped by default. But there is
> > chance
> > > > > > that node will not stopped correctly.
> > > > > >
> > > > > > May be we should terminate JVM process by default. But it will
> kill
> > > > > > all nodes in the JVM process. It's especially bad behaviour in
> case
> > > > > > when nodes belong different Ignite clusters (real use case).
> > > > > >
> > > > > > May be we should restart JVM process default. This approach has
> the
> > > > > > same problems as the previous one. And additionally it could lead
> > to
> > > > > > continues restarts and, therefore, continues exchanges and
> > > > > > rebalancing.
> > > > > >
> > > > > > Difficult choice. Could you please share your thoughts.
> > > > > >
> > > > > > [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > > > 14+Ignite+failures+handling
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrey Kuznetsov.
> > > > >
> > > >
> > >
> >
>


Reply via email to