[
https://issues.apache.org/jira/browse/RATIS-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761707#comment-17761707
]
Xinyu Tan commented on RATIS-1879:
----------------------------------
[~szetszwo]. Yes, I'm willing to do it if needed~
However, I just researched the CorruptionPolicy parameter, and it's currently
set to the EXCEPTION policy by default. After I changed it to WARN_AND_RETURN,
the cluster was able to start. It skipped incomplete data and only printed
exception logs, which seems to meet my expectations. This means that even if we
can't guarantee atomicity during writes, the cluster can still start up when
encountering incomplete data on restart, minimizing the impact on users,
although some data loss may occur (if users have a strong requirement for this,
they should adjust the parameters to fsync every time).
What I'd like to know is whether we are using this parameter correctly in this
way. If so, does this mean we don't need to make any further modifications to
the kernel?
EXCEPTION:
{code:java}
2023-08-27 21:21:04,384 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 223 entries from segment file
/data/iotdb-enterprise-1.2.1.1-bin/sbin/./../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_0-222
2023-08-27 21:21:04,386 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 223:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-08-27 21:21:04,443 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 240 entries from segment file
/data/iotdb-enterprise-1.2.1.1-bin/sbin/./../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_223-462
2023-08-27 21:21:04,444 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 463:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-08-27 21:21:04,445 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 2 entries from segment file
/data/iotdb-enterprise-1.2.1.1-bin/sbin/./../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_463-464
2023-08-27 21:21:04,445 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 465:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-08-27 21:21:04,447 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 62 entries from segment file
/data/iotdb-enterprise-1.2.1.1-bin/sbin/./../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_465-526
2023-08-27 21:21:04,448 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 527:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-08-27 21:21:04,452 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 102 entries from segment file
/data/iotdb-enterprise-1.2.1.1-bin/sbin/./../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_527-628
2023-08-27 21:21:04,455 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 629:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-08-27 21:21:04,457 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 58 entries from segment file
/data/iotdb-enterprise-1.2.1.1-bin/sbin/./../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_629-686
2023-08-27 21:21:04,457 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 687:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-08-27 21:21:04,558 [main] ERROR o.a.i.c.ServerCommandLine:63 - Failed to
execute system command
java.util.concurrent.CompletionException: java.lang.IllegalStateException:
0@group-000000000000: Failed to initRaftLog.
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1298)
at
java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1284)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:191)
at
org.apache.ratis.util.ConcurrentUtils.lambda$null$4(ConcurrentUtils.java:180)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalStateException: 0@group-000000000000: Failed to
initRaftLog.
at
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:222)
at
org.apache.ratis.server.impl.ServerState.lambda$new$5(ServerState.java:161)
at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
at org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:177)
at
org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:338)
at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:188)
... 4 common frames omitted
Caused by: java.io.IOException: Read extra bytes after the terminator!
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.verifyTerminator(SegmentedRaftLogReader.java:251)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:293)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:203)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:131)
at
org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:131)
at
org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:164)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:381)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:241)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:214)
at org.apache.ratis.server.raftlog.RaftLogBase.open(RaftLogBase.java:251)
at
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:239)
at
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:220)
... 9 common frames omitted {code}
WARN_AND_RETURN:
{code:java}
2023-09-04 14:39:15,259 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 0:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-09-04 14:39:15,264 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 223 entries from segment file
/Users/txy/Desktop/apache-iotdb-1.2.0-SNAPSHOT-all-bin/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_0-222
2023-09-04 14:39:15,265 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 223:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-09-04 14:39:15,267 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 240 entries from segment file
/Users/txy/Desktop/apache-iotdb-1.2.0-SNAPSHOT-all-bin/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_223-462
2023-09-04 14:39:15,268 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 463:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-09-04 14:39:15,268 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 2 entries from segment file
/Users/txy/Desktop/apache-iotdb-1.2.0-SNAPSHOT-all-bin/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_463-464
2023-09-04 14:39:15,268 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 465:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-09-04 14:39:15,269 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 62 entries from segment file
/Users/txy/Desktop/apache-iotdb-1.2.0-SNAPSHOT-all-bin/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_465-526
2023-09-04 14:39:15,269 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 527:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-09-04 14:39:15,270 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 102 entries from segment file
/Users/txy/Desktop/apache-iotdb-1.2.0-SNAPSHOT-all-bin/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_527-628
2023-09-04 14:39:15,270 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 629:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-09-04 14:39:15,271 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 58 entries from segment file
/Users/txy/Desktop/apache-iotdb-1.2.0-SNAPSHOT-all-bin/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_629-686
2023-09-04 14:39:15,271 [0-impl-thread1] INFO o.a.r.s.impl.ServerState:434 -
0@group-000000000000: set configuration 687:
peers:[0|rpc:127.0.0.1:10720|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
2023-09-04 14:39:15,281 [0-impl-thread1] WARN o.a.r.s.r.s.LogSegment:146 -
Failed to read segment file
/Users/txy/Desktop/apache-iotdb-1.2.0-SNAPSHOT-all-bin/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_inprogress_687
(687-): only 240 entries read successfully
java.io.IOException: Read extra bytes after the terminator!
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.verifyTerminator(SegmentedRaftLogReader.java:251)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:293)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:203)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:131)
at
org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:131)
at
org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:164)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:381)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:241)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:214)
at
org.apache.ratis.server.raftlog.RaftLogBase.open(RaftLogBase.java:251)
at
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:239)
at
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:220)
at
org.apache.ratis.server.impl.ServerState.lambda$new$5(ServerState.java:161)
at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
at
org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:177)
at
org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:338)
at
org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:188)
at
org.apache.ratis.util.ConcurrentUtils.lambda$null$4(ConcurrentUtils.java:180)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2023-09-04 14:39:15,282 [0-impl-thread1] INFO o.a.r.s.r.s.LogSegment:170 -
Successfully read 240 entries from segment file
/Users/txy/Desktop/apache-iotdb-1.2.0-SNAPSHOT-all-bin/sbin/../data/confignode/consensus/47474747-4747-4747-4747-000000000000/current/log_inprogress_687
2023-09-04 14:39:15,284 [0-impl-thread1] INFO
o.a.r.s.r.s.SegmentedRaftLogWorker:135 -
0@group-000000000000-SegmentedRaftLogWorker: flushIndex: setUnconditionally 0
-> 926
2023-09-04 14:39:15,284 [0-impl-thread1] INFO
o.a.r.s.r.s.SegmentedRaftLogWorker:135 -
0@group-000000000000-SegmentedRaftLogWorker: safeCacheEvictIndex:
setUnconditionally 0 -> 686
2023-09-04 14:39:15,325 [0-impl-thread1] INFO o.a.r.s.r.RaftLogBase:53 -
0@group-000000000000-SegmentedRaftLog: commitIndex: updateToMax old=686,
new=925, updated? true {code}
> Handle RaftLog corruption when unsafe flush is enabled.
> -------------------------------------------------------
>
> Key: RATIS-1879
> URL: https://issues.apache.org/jira/browse/RATIS-1879
> Project: Ratis
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 2.5.1
> Reporter: Song Ziyang
> Assignee: Tsz-wo Sze
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> During normal operations of the RaftServer, its containing virtual machine
> (VM) was unexpectedly shut down and subsequently restarted. Following the VM
> reboot, *our attempts to restart the RaftServer led to encountering the
> subsequent exception, indicating corruption in the Raft* {*}Log{*}{*}.{*}
> *The details of this exception please refer to
> [https://apache-iotdb.feishu.cn/docx/Zmyudq0FYoDVcsxDwHpcINyznfg]*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)