Re: IEP-14: Ignite failures handling (Discussion)

Valentin Kulichenko Tue, 13 Mar 2018 17:13:16 -0700

Ivan,

If grid hangs, graceful shutdown would most likely hang as well. Almost
never you can recover from a bad state using graceful procedures.


I agree that we should not create two defaults, especially in this case.
It's not even strictly defined what is embedded node in Ignite. For
example, if I start it using a custom main class and/or custom script
instead of ignite.sh, would it be embedded or standalone node?

-Val

On Tue, Mar 13, 2018 at 4:58 PM, Ivan Rakov <ivan.glu...@gmail.com> wrote:

> One more note: "kill if standalone, stop if embedded" differs from what
> you are suggesting "try graceful, then kill process regardless" only in
> case when graceful shutdown hangs.
> Do we have understanding, how often does graceful shutdown hang?
> Obviously, *grid hang* is often case, but it shouldn't be messed with
> *graceful shutdown hang*. From my experience, if something went wrong,
> users just prefer to do kill -9  because it's much more reliable and easy.
> Probably, in most of cases when kill -9 worked, graceful stop would have
> worked as well - we just don't have such statistics.
> It may be bad example, but: in our CI tests we intentionally break grid in
> many harsh ways and perform a graceful stop after the test execution, and
> it doesn't hang - otherwise we'd see many "Execution timeout" test suite
> hangs.
>
> Best Regards,
> Ivan Rakov
>
>
> On 14.03.2018 2:24, Dmitriy Setrakyan wrote:
>
>> On Tue, Mar 13, 2018 at 7:13 PM, Ivan Rakov <ivan.glu...@gmail.com>
>> wrote:
>>
>> I just would like to add my +1 for "kill if standalone, stop if embedded"
>>> default option. My arguments:
>>>
>>> 1) Regarding "If Ignite hangs - it will likely be impossible to stop":
>>> Unfortunately, it's true that Ignite can hang during stop procedure.
>>> However, most of failures described under IEP-14 (storage IO exceptions,
>>> death of critical system worker thread, etc) normally shouldn't turn node
>>> into "impossible to stop" state. Turning into that state is a bug
>>> itself. I
>>> guess that we shouldn't choose system behavior on the basis of known
>>> bugs.
>>>
>>
>> The whole discussion is about protecting against force-major issues,
>> including Ignite bugs. You are assuming that a user application will
>> somehow continue to function if an Ignite node is stopped. In most cases
>> it
>> will just freeze itself and cause the rest of the application to hang.
>>
>> Again, "kill+stop" is the most deterministic and the safest default
>> behavior. Try a graceful shutdown (which will make restart easier), and
>> then kill the process regardless.
>>
>> Note that we are arguing about the default behavior. If a user does not
>> like this default, then this user can change it to another behavior.
>>
>>
>> 2) User might want to handle Ignite node crash before shutting down the
>>> whole JVM - raise alert, close external resources, etc
>>>
>>> Very unlikely, but if a user is this advanced, then this user can change
>> the default behavior. Most users will not even know how to configure such
>> custom shutdown behavior and would prefer an automatic kill.
>>
>> 3) IEP-14 document has important notes: "More than one Ignite node could
>> be
>>
>>> started in one JVM process" and "Different nodes in one JVM process could
>>> belong to different clusters". This is possible only in embedded mode. I
>>> think, we shouldn't shock user by sudden JVM halt (possibly, along with
>>> another healthy nodes) if there's a chance of successful node stop.
>>>
>>> Has anyone actually seen a real example of that? I have not. This
>> scenario
>> is extremely unlikely and should not define the default behavior. Again,
>> if
>> a user is so advanced to come up with such a sophisticated deployment,
>> then
>> the same user should be able to set different default behaviors for
>> different clusters.
>>
>>
>

Re: IEP-14: Ignite failures handling (Discussion)

Reply via email to