It really depends on hardware and workload pattern. I expect that LOG_ONLY_SAFE will be either equal to LOG_ONLY or a few percent slower. We'll answer this question for sure after implementation of three fixes and benchmarking. Let's first of all get understanding whether extra durability guarantees make sense. I think that it does: power loss itself is really unlikely scenario, but LOG_ONLY_SAFE will make it much less risky. It will guarantee presence of all partitions after power loss in the whole data center, it will also make rebalancing after power loss on one node much faster.

Best Regards,
Ivan Rakov

On 16.03.2018 8:17, Dmitriy Setrakyan wrote:
Ivan,

Is there a performance difference between LOG_ONLY and LOG_ONLY_SAFE?

D.

On Thu, Mar 15, 2018 at 4:23 PM, Ivan Rakov <ivan.glu...@gmail.com> wrote:

Igniters and especially Native Persistence experts,

We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY in
2.4 release. That was difficult decision: we sacrificed power loss / OS
crash tolerance, but gained significant performance boost. From my
perspective, LOG_ONLY is right choice, but it still misses some critical
features that default mode should have.

Let's focus on exact guarantees each mode provides. Documentation explains
it in pretty simple manner: LOG_ONLY - writes survive process crash, FSYNC
- writes survive power loss scenarios. I have to notice that documentation
doesn't describe what exactly can happen to node in LOG_ONLY mode in case
of power loss / OS crash scenario. Basically, there are two possible
negative outcomes: loss of several last updates (it's exactly what can
happen in BACKGROUND mode in case of process crash) and total storage
corruption (not only last updates, but all data will be lost). I've made a
quick research on this and came into conclusion that power loss in LOG_ONLY
can lead to storage corruption. There are several explanations for this:
1) IgniteWriteAheadLogManager#fsync is kind of broken - it doesn't
perform actual fsync unless current WAL mode is FSYNC. We call this method
when we write checkpoint marker to WAL. As long as part of WAL before
checkpoint marker can be not synced, "physical" records that are necessary
for crash recovery in "Node stopped in the middle of checkpoint" scenario
may be corrupted after power loss. If that happens, we won't be able to
recover internal data structures, which means loss of all data.
2) We don't fsync WAL archive files unless current WAL mode is FSYNC. WAL
archive can contain necessary "physical" records as well, which leads us to
the case described above.
3) We do perform fsync on rollover (switch of current WAL segment) in all
modes, but only when there's enough space to write switch segment record -
see FileWriteHandle#close. So there's a little chance that we'll skip fsync
and bump into the same case.

Enforcing fsync on that three situations will give us a guarantee that
LOG_ONLY will survive power loss scenarios with possibility of losing
several last updates. There still can be a total binary mess in the last
part of WAL, but as long as we perform CRC check during WAL replay, we'll
detect start of that mess. Extra fsyncs may cause slight performance
degradation - all writes will have to await for one fsync on every rollover
and checkpoint. It's still much faster than fsync on every write in WAL - I
expect a few percent (0-5%) drop comparing to current LOG_ONLY. But
degradation is degradation, and LOG_ONLY mode without extra fsyncs makes
sense as well - that's why we need to introduce "LOG_ONLY + extra fsyncs"
as separate WAL mode. I think, we should make it default - it provides
significant durability bonus for the cost of one extra fsync for each WAL
segment written.

To sum it up, I propose a new set of possible WAL modes:
NONE - both process crash and power loss can lead to corruption
BACKGROUND - process crash can lead to last updates loss, power loss can
lead to corruption
LOG_ONLY - writes survive process crash, power loss can lead to corruption
LOG_ONLY_SAFE (default) - writes survive process crash, power loss can
lead to last updates loss
FSYNC - writes survive both process crash and power loss

Thoughts?


Best Regards,
Ivan Rakov



Reply via email to