Dmitriy. I think you and other participants of discussion are talking about different cases.
May be it be usefull to look at specific cases and discuss each of them separately? I look at IEP page and see following: ``` File IO errors. Usually IOException's threw by read/write operations on file system. The following subsystems should be considered as critical: * WAL * Page store * Meta store * Binary meta store ``` Suppose, we ran out of disk space on some node. The other things are all right. Should we do `System.exit(-1);` in that case? Personally, I fully agreed with Nick Podrash: "I can tell you as a user that if any library I was using in my application called System.exit without my consent would result in a lot of frustration." Also, do you have any examples of other products that do `System.exit(-1);` in case of troubles? В Вт, 13/03/2018 в 19:07 -0400, Dmitriy Setrakyan пишет: > On Tue, Mar 13, 2018 at 6:55 PM, Dmitry Pavlov <dpavlov....@gmail.com> > wrote: > > > What do you think if stop is default for all cases? > > > > Kill is configurable. > > > > We can consider enforse sockets close for 'stop'. This will allow to ignore > > hang node by rest of the cluster. > > > > Dmitriy, I see that you cannot come to terms with stopping a process that > was not started by Ignite. However, in majority of the deployments, users > would prefer that you would "kill" the process instead of leaving it > running in a "frozen" state. Frozen state is non-deterministic and it is > impossible to create a recovery for it. Killing the process is very > deterministic and can be recovered by restarting it in most cases. > > "stop" does not fix the problem we are trying to solve. The whole point is > to prevent frozen state, and "stop" without "kill" does not prevent it. I > am OK if "stop+kill" is the default behavior, which means that we try a > graceful shutdown and then always kill the process anyway. > > I think we should have the following configurable options: > - "stop+kill" (default) > - "kill" > - "stop" > - "stop+restart" (if stop fails, we should kill regardless)
Description: This is a digitally signed message part