[
https://issues.apache.org/jira/browse/RATIS-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai updated RATIS-1241:
------------------------------------
Fix Version/s: 2.0.0
(was: 1.1.0)
> Leader unable to append logs to a recovering follower when its logs have been
> purged.
> -------------------------------------------------------------------------------------
>
> Key: RATIS-1241
> URL: https://issues.apache.org/jira/browse/RATIS-1241
> Project: Ratis
> Issue Type: Bug
> Components: server
> Reporter: Aravindan Vijayan
> Assignee: Aravindan Vijayan
> Priority: Critical
> Fix For: 2.0.0
>
> Time Spent: 2h
> Remaining Estimate: 0h
>
> *Steps to reproduce*
> * Setup a 3 node ratis group.
> * Write some transactions into the quorum.
> * Bring 1 peer down.
> * Write more transactions into the quorum (other 2 nodes), take a snapshot at
> the last txn and purge logs from the remaining 2 nodes.
> * Start the node that was brought down.
> Leader falls into a possible irrecoverable state with respect to appending
> log entries to the follower.
> First time the follower comes back
> {code}
> 2020-12-15 15:42:01,069 [grpc-default-executor-5] WARN
> server.GrpcLogAppender (GrpcLogAppender.java:resetClient(110)) -
> omNode-3@group-523986131536->omNode-2-GrpcLogAppender: Leader has not got in
> touch with Follower omNode-3@group-523986131536->omNode-2(c-1,m0,n409,
> attendVote=true, lastRpcSendTime=0, lastRpcResponseTime=8398) yet, just keep
> nextIndex unchanged and retry.
> 2020-12-15 15:42:01,570 [grpc-default-executor-5] INFO
> server.RaftServer$Division (ServerState.java:setLeader(260)) -
> omNode-2@group-523986131536: change Leader from null to omNode-3 at term 3
> for appendEntries, leader elected after 954ms
> 2020-12-15 15:42:01,570 [grpc-default-executor-5] INFO
> server.RaftServer$Division
> (RaftServerImpl.java:checkInconsistentAppendEntries(1249)) -
> omNode-2@group-523986131536: Failed appendEntries as previous log entry
> ((t:3, i:408)) is not found
> 2020-12-15 15:42:01,571 [grpc-default-executor-5] INFO
> server.RaftServer$Division
> (RaftServerImpl.java:checkInconsistentAppendEntries(1215)) -
> omNode-2@group-523986131536: inconsistency entries.
> Reply:omNode-3<-omNode-2#11187658:FAIL,INCONSISTENCY,nextIndex:205,term:2,followerCommit:203
> 2020-12-15 15:42:01,572 [grpc-default-executor-5] INFO leader.FollowerInfo
> (FollowerInfoImpl.java:lambda$new$0(46)) -
> omNode-3@group-523986131536->omNode-2: nextIndex: updateUnconditionally 409
> -> 205
> 2020-12-15 15:42:01,572
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> ERROR leader.LogAppenderDaemon (LogAppenderDaemon.java:run(86)) -
> omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon failed
> org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found:
> index = 205
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:283)
> at
> org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:143)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:210)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:144)
> at
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:77)
> at java.lang.Thread.run(Thread.java:748)
> 2020-12-15 15:42:01,572
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> INFO server.RaftServer$Division (LeaderStateImpl.java:restart(497)) -
> omNode-3@group-523986131536-LeaderStateImpl: Restarting GrpcLogAppender for
> omNode-3@group-523986131536->omNode-2
> {code}
> After the above, the leader falls into this error loop.
> {code}
> 2020-12-15 15:42:01,574
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> ERROR leader.LogAppenderDaemon (LogAppenderDaemon.java:run(86)) -
> omNode-3@group-523986131536->omNode-2-GrpcLogAp
> pender-LogAppenderDaemon failed
> org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found:
> index = 0
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:283)
> at
> org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:143)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:210)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:144)
> at
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:77)
> at java.lang.Thread.run(Thread.java:748)
> 2020-12-15 15:42:01,575
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> INFO server.RaftServer$Division (LeaderStateImpl.java:restart(497)) -
> omNode-3@group-523986131536-LeaderStateImp
> l: Restarting GrpcLogAppender for omNode-3@group-523986131536->omNode-2
> 2020-12-15 15:42:01,575
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> INFO metrics.RatisMetrics (RatisMetrics.java:unregister(43)) - Unregistering
> Metrics Registry : ratis_grpc.log_a
> ppender.omNode-3@group-523986131536
> 2020-12-15 15:42:01,575
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> INFO server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) -
> raft.server.log.appender.snapshot.chunk.size.max
> = 16MB (=16777216) (default)
> 2020-12-15 15:42:01,575
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> INFO server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) -
> raft.server.log.appender.buffer.byte-limit = 3355
> 4432 (custom)
> 2020-12-15 15:42:01,576
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> INFO server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) -
> raft.server.log.appender.buffer.element-limit = 1
> 024 (custom)
> 2020-12-15 15:42:01,576
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> INFO grpc.GrpcConfigKeys (ConfUtils.java:logGet(44)) -
> raft.grpc.server.leader.outstanding.appends.max = 128 (de
> fault)
> 2020-12-15 15:42:01,576
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> INFO server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) -
> raft.server.rpc.request.timeout = 3000ms (default
> )
> 2020-12-15 15:42:01,576
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> INFO server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) -
> raft.server.log.appender.install.snapshot.enabled
> = false (custom)
> 2020-12-15 15:42:01,576
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> INFO metrics.RatisMetrics (RatisMetrics.java:lambda$create$0(36)) - Creating
> Metrics Registry : ratis_grpc.log_a
> ppender.omNode-3@group-523986131536
> 2020-12-15 15:42:01,577
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> ERROR leader.LogAppenderDaemon (LogAppenderDaemon.java:run(86)) -
> omNode-3@group-523986131536->omNode-2-GrpcLogAp
> pender-LogAppenderDaemon failed
> org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found:
> index = 0
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:283)
> at
> org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:143)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:210)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:144)
> at
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:77)
> at java.lang.Thread.run(Thread.java:748)
> 2020-12-15 15:42:01,578
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon]
> INFO server.RaftServer$Division (LeaderStateImpl.java:restart(497)) -
> omNode-3@group-523986131536-LeaderStateImpl: Restarting GrpcLogAppender for
> omNode-3@group-523986131536->omNode-2
> {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)