Ivan, It's all good then :) Thanks!
-Val On Mon, Mar 26, 2018 at 1:50 AM, Ivan Rakov <ivan.glu...@gmail.com> wrote: > Val, > > There's no any sense to use WalMode.NONE in production environment, it's > kept for testing and debugging purposes (including possible user activities > like capacity planning). > We already print a warning at node start in case WalMode.NONE is set: > > U.quietAndWarn(log,"Started write-ahead log manager in NONE mode, >> persisted data may be lost in " + >> "a case of unexpected node failure. Make sure to deactivate the >> cluster before shutdown."); >> > > Best Regards, > Ivan Rakov > > > On 24.03.2018 1:40, Valentin Kulichenko wrote: > >> Dmitry, >> >> Thanks for clarification. So it sounds like if we fix all other modes as >> we >> discuss here, NONE would be the only one allowing corruption. I also don't >> see much sense in this and I think we should clearly state this in the >> doc, >> as well print out a warning if NONE mode is used. Eventually, if it's >> confirmed that there are no reasonable use cases for it, we can deprecate >> it. >> >> -Val >> >> On Fri, Mar 23, 2018 at 3:26 PM, Dmitry Pavlov <dpavlov....@gmail.com> >> wrote: >> >> Hi Val, >>> >>> NONE means that the WAL log is disabled and not written at all. Use of >>> the >>> mode is at your own risk. It is possible that restore state after the >>> crash >>> at the middle of checkpoint will not succeed. I do not see much sence in >>> it, especially in production. >>> >>> BACKGROUND is full functional WAL mode, but allows some delay before >>> flush >>> to disk. >>> >>> Sincerely, >>> Dmitriy Pavlov >>> >>> сб, 24 мар. 2018 г. в 1:07, Valentin Kulichenko < >>> valentin.kuliche...@gmail.com>: >>> >>> I agree. In my view, any possibility to get a corrupted storage is a bug >>>> which needs to be fixed. >>>> >>>> BTW, can someone explain semantics of NONE mode? What is the difference >>>> from BACKGROUND from user's perspective? Is there any particular use >>>> case >>>> where it can be used? >>>> >>>> -Val >>>> >>>> On Fri, Mar 23, 2018 at 2:49 AM, Dmitry Pavlov <dpavlov....@gmail.com> >>>> wrote: >>>> >>>> Hi Ivan, >>>>> >>>>> IMO we have to add extra FSYNCS for BACKGROUND WAL. Agree? >>>>> >>>>> Sincerely, >>>>> Dmitriy Pavlov >>>>> >>>>> пт, 23 мар. 2018 г. в 12:23, Ivan Rakov <ivan.glu...@gmail.com>: >>>>> >>>>> Igniters, there's another important question about this matter. >>>>>> Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think that >>>>>> >>>>> we >>> >>>> have to do it: it will cause similar performance drop, but if we >>>>>> consider LOG_ONLY broken without these fixes, BACKGROUND is broken as >>>>>> >>>>> well. >>>>> >>>>>> Best Regards, >>>>>> Ivan Rakov >>>>>> >>>>>> On 23.03.2018 10:27, Ivan Rakov wrote: >>>>>> >>>>>>> Fixes are quite simple. >>>>>>> I expect them to be merged in master in a week in worst case. >>>>>>> >>>>>>> Best Regards, >>>>>>> Ivan Rakov >>>>>>> >>>>>>> On 22.03.2018 17:49, Denis Magda wrote: >>>>>>> >>>>>>>> Ivan, >>>>>>>> >>>>>>>> How quick are you going to merge the fix into the master? Many >>>>>>>> persistence >>>>>>>> related optimizations have already stacked up. Probably, we can >>>>>>>> >>>>>>> release >>>>> >>>>>> them sooner if the community agrees. >>>>>>>> >>>>>>>> -- >>>>>>>> Denis >>>>>>>> >>>>>>>> On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov < >>>>>>>> >>>>>>> ivan.glu...@gmail.com> >>> >>>> wrote: >>>>>>>> >>>>>>>> Thanks all! >>>>>>>>> We seem to have reached a consensus on this issue. I'll just add >>>>>>>>> necessary >>>>>>>>> fsyncs under IGNITE-7754. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Ivan Rakov >>>>>>>>> >>>>>>>>> >>>>>>>>> On 22.03.2018 15:13, Ilya Lantukh wrote: >>>>>>>>> >>>>>>>>> +1 for fixing LOG_ONLY. If current implementation doesn't >>>>>>>>>> >>>>>>>>> protect >>> >>>> from >>>>> >>>>>> data >>>>>>>>>> corruption, it doesn't make sence. >>>>>>>>>> >>>>>>>>>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda < >>>>>>>>>> >>>>>>>>> dma...@apache.org> >>> >>>> wrote: >>>>>>>>>> >>>>>>>>>> +1 for the fix of LOG_ONLY >>>>>>>>>> >>>>>>>>>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk < >>>>>>>>>>> alexey.goncha...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> +1 for fixing LOG_ONLY to enforce corruption safety given the >>>>>>>>>>> provided >>>>>>>>>>> >>>>>>>>>>>> performance results. >>>>>>>>>>>> >>>>>>>>>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov < >>>>>>>>>>>> >>>>>>>>>>> voze...@gridgain.com >>>> >>>>> : >>>>>> >>>>>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and >>>>>>>>>>>> >>>>>>>>>>> not a >>> >>>> drop >>>>>>>>>>>> at >>>>>>>>>>>> all, provided that we fixing a bug. I.e. should we implement >>>>>>>>>>>> >>>>>>>>>>> it >>> >>>> correctly >>>>>>>>>>>> in the first place we would never notice any "drop". >>>>>>>>>>>> >>>>>>>>>>>>> I do not understand why someone would like to use current >>>>>>>>>>>>> >>>>>>>>>>>> broken >>>> >>>>> mode. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov >>>>>>>>>>>>> <dpavlov....@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, I think option 1 is better. As Val said any mode that >>>>>>>>>>>>> >>>>>>>>>>>> allows >>>> >>>>> corruption >>>>>>>>>>>>> >>>>>>>>>>>>> does not make much sense. >>>>>>>>>>>>>> >>>>>>>>>>>>>> What Ivan mentioned here as drop, in relation to old mode >>>>>>>>>>>>>> >>>>>>>>>>>>> DEFAULT >>>>> >>>>>> (FSYNC >>>>>>>>>>>>> now), is still significant perfromance boost. >>>>>>>>>>>>> >>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>> Dmitriy Pavlov >>>>>>>>>>>>>> >>>>>>>>>>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov < >>>>>>>>>>>>>> >>>>>>>>>>>>> ivan.glu...@gmail.com >>>> >>>>> : >>>>>> >>>>>>> I've attached benchmark results to the JIRA ticket. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, >>>>>>>>>>>>>>> >>>>>>>>>>>>>> independent >>> >>>> of >>>>> >>>>>> WAL >>>>>>>>>>>>>> >>>>>>>>>>>>> compaction enabled flag. It's pretty significant drop: WAL >>>>>>>>>>>> >>>>>>>>>>>>> compaction >>>>>>>>>>>>>> >>>>>>>>>>>>> itself gives only ~3% drop. >>>>>>>>>>>> >>>>>>>>>>>>> I see two options here: >>>>>>>>>>>>>>> 1) Change LOG_ONLY behavior. That implies that we'll be >>>>>>>>>>>>>>> >>>>>>>>>>>>>> ready >>> >>>> to >>>>> >>>>>> release >>>>>>>>>>>>>> AI 2.5 with 7% drop. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default, add release >>>>>>>>>>>>>>> >>>>>>>>>>>>>> note >>> >>>> to AI >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2.5 >>>>>>>>>>>>>> >>>>>>>>>>>>> that we added power loss durability in default mode, but user >>>>>>>>>>>>> >>>>>>>>>>>> may >>>> >>>>> fallback to previous LOG_ONLY in order to retain >>>>>>>>>>>>>>> >>>>>>>>>>>>>> performance. >>> >>>> Thoughts? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>> Ivan Rakov >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Val, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If a storage is in >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> corrupted state, does it mean that it needs to be >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> completely >>>> >>>>> removed >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>>>> cluster needs to be restarted without data? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes, there's a chance that in LOG_ONLY all local data will >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> be >>>> >>>>> lost, >>>>>>>>>>>>>>> >>>>>>>>>>>>>> but only in *power loss**/ OS crash* case. >>>>>>>>>>>> >>>>>>>>>>>>> kill -9, JVM crash, death of critical system thread and >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> all >>> >>>> other >>>>>>>>>>>>>>>> cases that usually take place are variations of *process >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> crash*. >>>>> >>>>>> All >>>>>>>>>>>>>>> >>>>>>>>>>>>>> WAL modes (except NONE, of course) ensure corruption-safety >>>>>>>>>>>>> >>>>>>>>>>>> in >>> >>>> case >>>>>>>>>>>>>>> >>>>>>>>>>>>>> of >>>>>>>>>>>> >>>>>>>>>>>>> process crash. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> If so, I'm not sure any mode >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> that allows corruption makes much sense to me. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It depends on performance impact of enforcing power-loss >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> corruption >>>>>>>>>>>>>>> >>>>>>>>>>>>>> safety. Price of full protection from power loss is high - >>>>>>>>>>>> >>>>>>>>>>> FSYNC >>> >>>> is >>>>>>>>>>>>>>> >>>>>>>>>>>>>> way slower (2-10 times) than other WAL modes. The question is >>>>>>>>>>>> >>>>>>>>>>>>> whether >>>>>>>>>>>>>>> >>>>>>>>>>>>>> ensuring weaker guarantees (corruption can't happen, but loss >>>>>>>>>>>>> >>>>>>>>>>>> of >>>> >>>>> last >>>>>>>>>>>>>>> >>>>>>>>>>>>>> updates can) will affect performance as badly as strong >>>>>>>>>>>>> >>>>>>>>>>>>>> guarantees. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'll share benchmark results soon. >>>>>>>>>>>> >>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>> Ivan Rakov >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Guys, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> What do we understand under "data corruption" here? If a >>>>>>>>>>>>>>>>> storage >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> in >>>>>>>>>>>>> >>>>>>>>>>>>> corrupted state, does it mean that it needs to be completely >>>>>>>>>>>>>> >>>>>>>>>>>>>>> removed >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>>>> cluster needs to be restarted without data? If so, I'm not >>>>>>>>>>>>>>> >>>>>>>>>>>>>> sure >>>> >>>>> any >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> mode >>>>>>>>>>>>> >>>>>>>>>>>>>> that allows corruption makes much sense to me. How am I >>>>>>>>>>>>>>> >>>>>>>>>>>>>> supposed >>>>> >>>>>> to >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> use a >>>>>>>>>>>>> >>>>>>>>>>>>>> database, if virtually any failure can end with complete >>>>>>>>>>>>>>>>> loss of >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> data? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In any case, this definitely should not be a default >>>>>>>>>>>>>> >>>>>>>>>>>>> behavior. >>> >>>> If >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> user ever >>>>>>>>>>>> >>>>>>>>>>>>> switches to corruption-unsafe mode, there should be a >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> clear >>> >>>> warning >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> about >>>>>>>>>>>>> >>>>>>>>>>>>>> this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Val >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov < >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ivan.glu...@gmail.com> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Ticket to track changes: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>> Ivan Rakov >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov < >>>>>>>>>>>>>>>>>> ivan.glu...@gmail.com >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Vladimir, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Unlike BACKGROUND, LOG_ONLY provides strict write >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> guarantees >>>>> >>>>>> unless power >>>>>>>>>>>>>>>>>>>> loss has happened. >>>>>>>>>>>>>>>>>>>> Seems like we need to measure performance difference >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> to >>> >>>> decide >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> whether do >>>>>>>>>>>> >>>>>>>>>>>>> we need separate WAL mode. If it will be invisible, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> we'll >>>> >>>>> just >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> fix >>>>>>>>>>>> >>>>>>>>>>>>> these >>>>>>>>>>>>>> >>>>>>>>>>>>>>> bugs without introducing new mode; if it will be >>>>>>>>>>>>>>>>>>>> perceptible, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> we'll >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> continue the discussion about introducing LOG_ONLY_SAFE. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Makes sense? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes, this sounds like the right approach. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>> >