Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ivan Rakov Fri, 16 Mar 2018 00:55:36 -0700

Vladimir,

Unlike BACKGROUND, LOG_ONLY provides strict write guarantees unlesspower loss has happened.Seems like we need to measure performance difference to decide whetherdo we need separate WAL mode. If it will be invisible, we'll just fixthese bugs without introducing new mode; if it will be perceptible,we'll continue the discussion about introducing LOG_ONLY_SAFE.

Makes sense?


Best Regards,
Ivan Rakov

On 16.03.2018 10:45, Dmitry Pavlov wrote:

Folks, I do not expect any performance degradation here for high load
becase we already do fsync on rollover. So extra fsyncs will be almost
free. We should do this fsync without holding CP lock , of course.

(see also point 3:
3) We do perform fsync on rollover (switch of current WAL segment) in all
modes, but only when there's enough space to write switch segment record -
see FileWriteHandle # close. So there's a little chance that we'll skip
fsync and bump into the same case)

++1 from me for change Log only to be safe in all cases
+1 create new mode 'Log only safe'

пт, 16 мар. 2018 г. в 10:31, Vladimir Ozerov <[email protected]>:

Same question. It would be very difficult to explain these two modes to
users. We should do our best to fix LOG_ONLY first. Without these
guarantees there is no reason to keep LOG_ONLY at all, user could simply
use BACKGROUND with high flush frequency. This is precisely how Cassandra
works.

p.1 - sounds like a bug
p.2 - sounds like a bug as well; hopefully it should not introduce serious
performance hit unless we write too much data to WAL, what would mean that
we should work on it's optimization (e.g. free list update overhead, no
delta updates, etc).
p.3 - sounds like a bug as well

On Fri, Mar 16, 2018 at 8:17 AM, Dmitriy Setrakyan <[email protected]>
wrote:

Ivan,

Is there a performance difference between LOG_ONLY and LOG_ONLY_SAFE?

D.

On Thu, Mar 15, 2018 at 4:23 PM, Ivan Rakov <[email protected]>

wrote:

Igniters and especially Native Persistence experts,

We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY

in

2.4 release. That was difficult decision: we sacrificed power loss / OS
crash tolerance, but gained significant performance boost. From my
perspective, LOG_ONLY is right choice, but it still misses some

critical

features that default mode should have.

Let's focus on exact guarantees each mode provides. Documentation

explains

it in pretty simple manner: LOG_ONLY - writes survive process crash,

FSYNC

- writes survive power loss scenarios. I have to notice that

documentation

doesn't describe what exactly can happen to node in LOG_ONLY mode in

case

of power loss / OS crash scenario. Basically, there are two possible
negative outcomes: loss of several last updates (it's exactly what can
happen in BACKGROUND mode in case of process crash) and total storage
corruption (not only last updates, but all data will be lost). I've

made

quick research on this and came into conclusion that power loss in

LOG_ONLY

can lead to storage corruption. There are several explanations for

this:

1) IgniteWriteAheadLogManager#fsync is kind of broken - it doesn't
perform actual fsync unless current WAL mode is FSYNC. We call this

method

when we write checkpoint marker to WAL. As long as part of WAL before
checkpoint marker can be not synced, "physical" records that are

necessary

for crash recovery in "Node stopped in the middle of checkpoint"

scenario

may be corrupted after power loss. If that happens, we won't be able to
recover internal data structures, which means loss of all data.
2) We don't fsync WAL archive files unless current WAL mode is FSYNC.

WAL

archive can contain necessary "physical" records as well, which leads

us

to

the case described above.
3) We do perform fsync on rollover (switch of current WAL segment) in

all

modes, but only when there's enough space to write switch segment

record

see FileWriteHandle#close. So there's a little chance that we'll skip

fsync

and bump into the same case.

Enforcing fsync on that three situations will give us a guarantee that
LOG_ONLY will survive power loss scenarios with possibility of losing
several last updates. There still can be a total binary mess in the

last

part of WAL, but as long as we perform CRC check during WAL replay,

we'll

detect start of that mess. Extra fsyncs may cause slight performance
degradation - all writes will have to await for one fsync on every

rollover

and checkpoint. It's still much faster than fsync on every write in WAL

- I

expect a few percent (0-5%) drop comparing to current LOG_ONLY. But
degradation is degradation, and LOG_ONLY mode without extra fsyncs

makes

sense as well - that's why we need to introduce "LOG_ONLY + extra

fsyncs"

as separate WAL mode. I think, we should make it default - it provides
significant durability bonus for the cost of one extra fsync for each

WAL

segment written.

To sum it up, I propose a new set of possible WAL modes:
NONE - both process crash and power loss can lead to corruption
BACKGROUND - process crash can lead to last updates loss, power loss

can

lead to corruption
LOG_ONLY - writes survive process crash, power loss can lead to

corruption

LOG_ONLY_SAFE (default) - writes survive process crash, power loss can
lead to last updates loss
FSYNC - writes survive both process crash and power loss

Thoughts?


Best Regards,
Ivan Rakov

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Reply via email to