Ivan, Is there a performance difference between LOG_ONLY and LOG_ONLY_SAFE?
D. On Thu, Mar 15, 2018 at 4:23 PM, Ivan Rakov <ivan.glu...@gmail.com> wrote: > Igniters and especially Native Persistence experts, > > We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY in > 2.4 release. That was difficult decision: we sacrificed power loss / OS > crash tolerance, but gained significant performance boost. From my > perspective, LOG_ONLY is right choice, but it still misses some critical > features that default mode should have. > > Let's focus on exact guarantees each mode provides. Documentation explains > it in pretty simple manner: LOG_ONLY - writes survive process crash, FSYNC > - writes survive power loss scenarios. I have to notice that documentation > doesn't describe what exactly can happen to node in LOG_ONLY mode in case > of power loss / OS crash scenario. Basically, there are two possible > negative outcomes: loss of several last updates (it's exactly what can > happen in BACKGROUND mode in case of process crash) and total storage > corruption (not only last updates, but all data will be lost). I've made a > quick research on this and came into conclusion that power loss in LOG_ONLY > can lead to storage corruption. There are several explanations for this: > 1) IgniteWriteAheadLogManager#fsync is kind of broken - it doesn't > perform actual fsync unless current WAL mode is FSYNC. We call this method > when we write checkpoint marker to WAL. As long as part of WAL before > checkpoint marker can be not synced, "physical" records that are necessary > for crash recovery in "Node stopped in the middle of checkpoint" scenario > may be corrupted after power loss. If that happens, we won't be able to > recover internal data structures, which means loss of all data. > 2) We don't fsync WAL archive files unless current WAL mode is FSYNC. WAL > archive can contain necessary "physical" records as well, which leads us to > the case described above. > 3) We do perform fsync on rollover (switch of current WAL segment) in all > modes, but only when there's enough space to write switch segment record - > see FileWriteHandle#close. So there's a little chance that we'll skip fsync > and bump into the same case. > > Enforcing fsync on that three situations will give us a guarantee that > LOG_ONLY will survive power loss scenarios with possibility of losing > several last updates. There still can be a total binary mess in the last > part of WAL, but as long as we perform CRC check during WAL replay, we'll > detect start of that mess. Extra fsyncs may cause slight performance > degradation - all writes will have to await for one fsync on every rollover > and checkpoint. It's still much faster than fsync on every write in WAL - I > expect a few percent (0-5%) drop comparing to current LOG_ONLY. But > degradation is degradation, and LOG_ONLY mode without extra fsyncs makes > sense as well - that's why we need to introduce "LOG_ONLY + extra fsyncs" > as separate WAL mode. I think, we should make it default - it provides > significant durability bonus for the cost of one extra fsync for each WAL > segment written. > > To sum it up, I propose a new set of possible WAL modes: > NONE - both process crash and power loss can lead to corruption > BACKGROUND - process crash can lead to last updates loss, power loss can > lead to corruption > LOG_ONLY - writes survive process crash, power loss can lead to corruption > LOG_ONLY_SAFE (default) - writes survive process crash, power loss can > lead to last updates loss > FSYNC - writes survive both process crash and power loss > > Thoughts? > > > Best Regards, > Ivan Rakov > >