Ivan, If grid hangs, graceful shutdown would most likely hang as well. Almost never you can recover from a bad state using graceful procedures.
I agree that we should not create two defaults, especially in this case. It's not even strictly defined what is embedded node in Ignite. For example, if I start it using a custom main class and/or custom script instead of ignite.sh, would it be embedded or standalone node? -Val On Tue, Mar 13, 2018 at 4:58 PM, Ivan Rakov <ivan.glu...@gmail.com> wrote: > One more note: "kill if standalone, stop if embedded" differs from what > you are suggesting "try graceful, then kill process regardless" only in > case when graceful shutdown hangs. > Do we have understanding, how often does graceful shutdown hang? > Obviously, *grid hang* is often case, but it shouldn't be messed with > *graceful shutdown hang*. From my experience, if something went wrong, > users just prefer to do kill -9 because it's much more reliable and easy. > Probably, in most of cases when kill -9 worked, graceful stop would have > worked as well - we just don't have such statistics. > It may be bad example, but: in our CI tests we intentionally break grid in > many harsh ways and perform a graceful stop after the test execution, and > it doesn't hang - otherwise we'd see many "Execution timeout" test suite > hangs. > > Best Regards, > Ivan Rakov > > > On 14.03.2018 2:24, Dmitriy Setrakyan wrote: > >> On Tue, Mar 13, 2018 at 7:13 PM, Ivan Rakov <ivan.glu...@gmail.com> >> wrote: >> >> I just would like to add my +1 for "kill if standalone, stop if embedded" >>> default option. My arguments: >>> >>> 1) Regarding "If Ignite hangs - it will likely be impossible to stop": >>> Unfortunately, it's true that Ignite can hang during stop procedure. >>> However, most of failures described under IEP-14 (storage IO exceptions, >>> death of critical system worker thread, etc) normally shouldn't turn node >>> into "impossible to stop" state. Turning into that state is a bug >>> itself. I >>> guess that we shouldn't choose system behavior on the basis of known >>> bugs. >>> >> >> The whole discussion is about protecting against force-major issues, >> including Ignite bugs. You are assuming that a user application will >> somehow continue to function if an Ignite node is stopped. In most cases >> it >> will just freeze itself and cause the rest of the application to hang. >> >> Again, "kill+stop" is the most deterministic and the safest default >> behavior. Try a graceful shutdown (which will make restart easier), and >> then kill the process regardless. >> >> Note that we are arguing about the default behavior. If a user does not >> like this default, then this user can change it to another behavior. >> >> >> 2) User might want to handle Ignite node crash before shutting down the >>> whole JVM - raise alert, close external resources, etc >>> >>> Very unlikely, but if a user is this advanced, then this user can change >> the default behavior. Most users will not even know how to configure such >> custom shutdown behavior and would prefer an automatic kill. >> >> 3) IEP-14 document has important notes: "More than one Ignite node could >> be >> >>> started in one JVM process" and "Different nodes in one JVM process could >>> belong to different clusters". This is possible only in embedded mode. I >>> think, we shouldn't shock user by sudden JVM halt (possibly, along with >>> another healthy nodes) if there's a chance of successful node stop. >>> >>> Has anyone actually seen a real example of that? I have not. This >> scenario >> is extremely unlikely and should not define the default behavior. Again, >> if >> a user is so advanced to come up with such a sophisticated deployment, >> then >> the same user should be able to set different default behaviors for >> different clusters. >> >> >