[jira] [Commented] (RATIS-1879) Handle RaftLog corruption when unsafe flush is enabled.

Xinyu Tan (Jira) Mon, 28 Aug 2023 19:28:20 -0700


    [ 
https://issues.apache.org/jira/browse/RATIS-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759780#comment-17759780
 ]


Xinyu Tan commented on RATIS-1879:
----------------------------------

{code:java}
private void flushIfNecessary() throws IOException {
  if (shouldFlush()) {
    raftLogMetrics.onRaftLogFlush();
    LOG.debug("{}: flush {}", name, out);
    try(UncheckedAutoCloseable ignored = raftLogMetrics.startFlushTimer()) {
      final CompletableFuture<Void> f = stateMachine != null ?
          stateMachine.data().flush(lastWrittenIndex) :
          CompletableFuture.completedFuture(null);
      if (stateMachineDataPolicy.isSync()) {
        stateMachineDataPolicy.getFromFuture(f, () -> this + 
"-flushStateMachineData");
      }
      flushBatchSize = (int)(lastWrittenIndex - flushIndex.get());
      if (unsafeFlush) {
        // unsafe-flush: call updateFlushedIndexIncreasingly() without waiting 
the underlying FileChannel.force(..).
        unsafeFlushOutStream();
        updateFlushedIndexIncreasingly();
      } else if (asyncFlush) {
        asyncFlushOutStream(f);
      } else {
        flushOutStream();
        if (!stateMachineDataPolicy.isSync()) {
          IOUtils.getFromFuture(f, () -> this + "-flushStateMachineData");
        }
        updateFlushedIndexIncreasingly();
      }
    }
  }
} {code}
Hi, it seems that as long as we don't invoke fsync every time, even if we don't 
enable unsafeFlush or asyncFlush, as long as the data is written to the page 
cache, this issue might arise due to our inability to control the disk flushing 
strategy of the operating system.

> Handle RaftLog corruption when unsafe flush is enabled.
> -------------------------------------------------------
>
>                 Key: RATIS-1879
>                 URL: https://issues.apache.org/jira/browse/RATIS-1879
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.0.0, 2.5.1
>            Reporter: Song Ziyang
>            Assignee: Tsz-wo Sze
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> During normal operations of the RaftServer, its containing virtual machine 
> (VM) was unexpectedly shut down and subsequently restarted. Following the VM 
> reboot, *our attempts to restart the RaftServer led to encountering the 
> subsequent exception, indicating corruption in the Raft* {*}Log{*}{*}.{*}
> *The details of this exception please refer to 
> [https://apache-iotdb.feishu.cn/docx/Zmyudq0FYoDVcsxDwHpcINyznfg]* 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (RATIS-1879) Handle RaftLog corruption when unsafe flush is enabled.

Reply via email to