Wei-Chiu Chuang created RATIS-1100:
--------------------------------------
Summary: Make raft log gap error easier to troubleshoot
Key: RATIS-1100
URL: https://issues.apache.org/jira/browse/RATIS-1100
Project: Ratis
Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Wei-Chiu Chuang
Upon restart, Ozone Manager won't start and emitted the following error:
{code:java}
2020-10-19 12:04:10,639 INFO
org.apache.ratis.server.raftlog.segmented.LogSegment: Successfully read 7553
entries from segment file
/var/lib/hadoop-ozone/fake_om/ratis/1b9ac7ae-cd52-3ab1-8089-942f8267f22a/current/log_25657965-25665517
2020-10-19 12:04:10,639 ERROR org.apache.hadoop.ozone.om.OzoneManagerStarter:
OM start failed with exception
java.io.IOException: java.lang.IllegalStateException
at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
at
org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:289)
at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:301)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.start(OzoneManagerRatisServer.java:367)
at org.apache.hadoop.ozone.om.OzoneManager.start(OzoneManager.java:1138)
at
org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:125)
at
org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:79)
at
org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:67)
at
org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:38)
at picocli.CommandLine.executeUserObject(CommandLine.java:1933)
at picocli.CommandLine.access$1100(CommandLine.java:145)
at
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2326)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2291)
at
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2152)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:2530)
at picocli.CommandLine.parseWithHandler(CommandLine.java:2465)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87)
at
org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:51)
Caused by: java.lang.IllegalStateException
at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:36)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.validateAdding(SegmentedRaftLogCache.java:400)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.addSegment(SegmentedRaftLogCache.java:405)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:367)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:249)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:217)
at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:276)
at org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:191)
at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:121)
at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:123)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:213){code}
Looking at the code and checking the ratis log directory, I realized there is a
gap in ratis log files (7659964 vs 25657965).
File this Jira to make this error message easier to understand, without the
need to look at the code.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)