I’m totally for the replacement of ‘crashed’ with ‘stopped’. As for the waiting of the checkpointing completion I would NOT do it the default behavior and would rather check the ‘cancel’ flag to make a decision. If the ‘cancel’ is ‘true’ (which is default) then we’re not going to wait for the completion and should print a message that ‘cancel’ has to be set to ‘false’ explicitly if the user prefers to wait while the checkpoint is over before shutting down a node.
— Denis > On Aug 4, 2017, at 4:31 AM, Ivan Rakov <ivan.glu...@gmail.com> wrote: > > My vote is still for making message softer (crashed -> stopped) and keeping > logic as is. > > Example with File.close() is good, but I think it's not the case here. The > state on disk after node stop *will not* reflect all user actions made before > Ignite.close() call, independent of whether node was stopped during > checkpoint. > Ignite will recover to actual state anyway, the only difference is WAL replay > algorithm (stopping during checkpoint will force Ignite to replay delta > records). > > However, waiting for checkpoint on node stop brings two advantages: > 1) Next start will be faster - less WAL records to replay. > 2) Partition files will be locally consistent after node stop. User will be > able to save partition file for any kind of analysis. > > Are they strong enough to force user to wait on stop? > > Best Regards, > Ivan Rakov > > On 04.08.2017 13:42, Vyacheslav Daradur wrote: >> Hi guys, I'll just add my opinion if you don't mind. >> >>> May be we should implement Vladimir's suggestion to flush the pages >> without >>> respect to the cancel flag? Are there any thoughts on this? >> I think It's good suggestion. >> But in case of unit-testing a developer usually call #stopAllGrids() at the >> end of all tests. >> The method GridAbstactTest#stopAllGrids() is built on top of the >> method G.stop(name, >> true) including. >> IMO in that case checkpoints' flushing isn't necessary. >> >> >> 2017-08-04 13:25 GMT+03:00 Dmitry Pavlov <dpavlov....@gmail.com>: >> >>> Thank you all for replies. >>> >>> I like idea to replace 'crashed' to 'stop'. 'crashed' word is really >>> confusing. >>> >>> But still, if I call close () on file, all data is flushed to disk. But for >>> ignite.close () checkpoint may be not finished. >>> >>> May be we should implement Vladimir's suggestion to flush the pages without >>> respect to the cancel flag? Are there any thoughts on this? >>> >>> пт, 4 авг. 2017 г. в 11:12, Vladimir Ozerov <voze...@gridgain.com>: >>> >>>> Ivan, >>>> >>>> Hanging on Ignite.close() will confuse user no more than restore on start >>>> after graceful shutdown. IMO correct approach here would be to: >>>> 1) wait for checkpoint completion irrespective of "cancel" flag, because >>>> this flag relates to compute jobs only as per documentation >>>> 2) print an INFO message to the log that we are saving a checkpoint due >>> to >>>> node stop. >>>> >>>> On Fri, Aug 4, 2017 at 10:54 AM, Ivan Rakov <ivan.glu...@gmail.com> >>> wrote: >>>>> Dmitriy, >>>>> >>>>> From my point of view, invoking stop(true) is correct behaviour. >>>>> >>>>> Stopping node in the middle of checkpoint is absolutely valid case. >>>> That's >>>>> how persistence works - node will restore memory state if stopped at >>> any >>>>> moment. >>>>> On the other hand, checkpoint may last for a long time. Thread hanging >>> on >>>>> Ignite.close() may confuse user much more than "crashed in the middle >>> of >>>>> checkpoint" message. >>>>> >>>>> Best Regards, >>>>> Ivan Rakov >>>>> >>>>> >>>>> On 03.08.2017 22:34, Dmitry Pavlov wrote: >>>>> >>>>>> Hi Igniters, >>>>>> >>>>>> I’ve created the simplest example using Ignite 2.1 and persistence >>> (see >>>>>> the >>>>>> code below). I've included Ignite instance into try-with-resources (I >>>>>> think >>>>>> it is default approach for AutoCloseable inheritors). >>>>>> >>>>>> But next time when I started this server I got message: “Ignite node >>>>>> crashed in the middle of checkpoint. Will restore memory state and >>>> enforce >>>>>> checkpoint on node start.” >>>>>> >>>>>> This happens because in close() method we don’t wait checkpoint to >>> end. >>>> I >>>>>> am afraid this behaviour may confuse users on the first use of the >>>>>> product. >>>>>> >>>>>> What do you think if we change Ignite.close() functioning from >>>> stop(true) >>>>>> to stop(false)? This will allow to wait checkpoints to finish by >>>> default. >>>>>> Alternatively, we may improve example to show how to shutdown server >>>> node >>>>>> correctly. Current PersistentStoreExample does not cover server node >>>>>> shutdown. >>>>>> >>>>>> Any concerns on close() method change? >>>>>> >>>>>> Sincerely, >>>>>> Dmitriy Pavlov >>>>>> >>>>>> >>>>>> IgniteConfiguration cfg = new IgniteConfiguration(); >>>>>> cfg.setPersistentStoreConfiguration(new >>> PersistentStoreConfiguration()); >>>>>> try (Ignite ignite = Ignition.start(cfg)){ >>>>>> ignite.active(true); >>>>>> IgniteCache<String, String> cache = ignite.getOrCreateCache("test" >>> ); >>>>>> for (int i = 0; i < 1000; i++) >>>>>> cache.put("Key" + i, "Value" + i); >>>>>> } >>>>>> >>>>>> >> >> >