[ 
https://issues.apache.org/jira/browse/RATIS-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated RATIS-1241:
------------------------------------
    Fix Version/s: 2.0.0
                       (was: 1.1.0)

> Leader unable to append logs to a recovering follower when its logs have been 
> purged.
> -------------------------------------------------------------------------------------
>
>                 Key: RATIS-1241
>                 URL: https://issues.apache.org/jira/browse/RATIS-1241
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>            Reporter: Aravindan Vijayan
>            Assignee: Aravindan Vijayan
>            Priority: Critical
>             Fix For: 2.0.0
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> *Steps to reproduce*
> * Setup a 3 node ratis group.
> * Write some transactions into the quorum.
> * Bring 1 peer down. 
> * Write more transactions into the quorum (other 2 nodes), take a snapshot at 
> the last txn and purge logs from the remaining 2 nodes.
> * Start the node that was brought down.
> Leader falls into a possible irrecoverable state with respect to appending 
> log entries to the follower.
> First time the follower comes back
> {code}
> 2020-12-15 15:42:01,069 [grpc-default-executor-5] WARN  
> server.GrpcLogAppender (GrpcLogAppender.java:resetClient(110)) - 
> omNode-3@group-523986131536->omNode-2-GrpcLogAppender: Leader has not got in 
> touch with Follower omNode-3@group-523986131536->omNode-2(c-1,m0,n409, 
> attendVote=true, lastRpcSendTime=0, lastRpcResponseTime=8398) yet, just keep 
> nextIndex unchanged and retry.
> 2020-12-15 15:42:01,570 [grpc-default-executor-5] INFO  
> server.RaftServer$Division (ServerState.java:setLeader(260)) - 
> omNode-2@group-523986131536: change Leader from null to omNode-3 at term 3 
> for appendEntries, leader elected after 954ms
> 2020-12-15 15:42:01,570 [grpc-default-executor-5] INFO  
> server.RaftServer$Division 
> (RaftServerImpl.java:checkInconsistentAppendEntries(1249)) - 
> omNode-2@group-523986131536: Failed appendEntries as previous log entry 
> ((t:3, i:408)) is not found
> 2020-12-15 15:42:01,571 [grpc-default-executor-5] INFO  
> server.RaftServer$Division 
> (RaftServerImpl.java:checkInconsistentAppendEntries(1215)) - 
> omNode-2@group-523986131536: inconsistency entries. 
> Reply:omNode-3<-omNode-2#11187658:FAIL,INCONSISTENCY,nextIndex:205,term:2,followerCommit:203
> 2020-12-15 15:42:01,572 [grpc-default-executor-5] INFO  leader.FollowerInfo 
> (FollowerInfoImpl.java:lambda$new$0(46)) - 
> omNode-3@group-523986131536->omNode-2: nextIndex: updateUnconditionally 409 
> -> 205
> 2020-12-15 15:42:01,572 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> ERROR leader.LogAppenderDaemon (LogAppenderDaemon.java:run(86)) - 
> omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon failed
> org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: 
> index = 205
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:283)
>         at 
> org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:143)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:210)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:144)
>         at 
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:77)
>         at java.lang.Thread.run(Thread.java:748)
> 2020-12-15 15:42:01,572 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> INFO  server.RaftServer$Division (LeaderStateImpl.java:restart(497)) - 
> omNode-3@group-523986131536-LeaderStateImpl: Restarting GrpcLogAppender for 
> omNode-3@group-523986131536->omNode-2
> {code}
> After the above, the leader falls into this error loop.
> {code}
> 2020-12-15 15:42:01,574 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> ERROR leader.LogAppenderDaemon (LogAppenderDaemon.java:run(86)) - 
> omNode-3@group-523986131536->omNode-2-GrpcLogAp
> pender-LogAppenderDaemon failed
> org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: 
> index = 0
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:283)
>         at 
> org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:143)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:210)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:144)
>         at 
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:77)
>         at java.lang.Thread.run(Thread.java:748)
> 2020-12-15 15:42:01,575 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> INFO  server.RaftServer$Division (LeaderStateImpl.java:restart(497)) - 
> omNode-3@group-523986131536-LeaderStateImp
> l: Restarting GrpcLogAppender for omNode-3@group-523986131536->omNode-2
> 2020-12-15 15:42:01,575 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> INFO  metrics.RatisMetrics (RatisMetrics.java:unregister(43)) - Unregistering 
> Metrics Registry : ratis_grpc.log_a
> ppender.omNode-3@group-523986131536
> 2020-12-15 15:42:01,575 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> INFO  server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) - 
> raft.server.log.appender.snapshot.chunk.size.max
> = 16MB (=16777216) (default)
> 2020-12-15 15:42:01,575 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> INFO  server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) - 
> raft.server.log.appender.buffer.byte-limit = 3355
> 4432 (custom)
> 2020-12-15 15:42:01,576 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> INFO  server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) - 
> raft.server.log.appender.buffer.element-limit = 1
> 024 (custom)
> 2020-12-15 15:42:01,576 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> INFO  grpc.GrpcConfigKeys (ConfUtils.java:logGet(44)) - 
> raft.grpc.server.leader.outstanding.appends.max = 128 (de
> fault)
> 2020-12-15 15:42:01,576 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> INFO  server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) - 
> raft.server.rpc.request.timeout = 3000ms (default
> )
> 2020-12-15 15:42:01,576 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> INFO  server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) - 
> raft.server.log.appender.install.snapshot.enabled
>  = false (custom)
> 2020-12-15 15:42:01,576 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> INFO  metrics.RatisMetrics (RatisMetrics.java:lambda$create$0(36)) - Creating 
> Metrics Registry : ratis_grpc.log_a
> ppender.omNode-3@group-523986131536
> 2020-12-15 15:42:01,577 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> ERROR leader.LogAppenderDaemon (LogAppenderDaemon.java:run(86)) - 
> omNode-3@group-523986131536->omNode-2-GrpcLogAp
> pender-LogAppenderDaemon failed
> org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: 
> index = 0
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:283)
>         at 
> org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:143)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:210)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:144)
>         at 
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:77)
>         at java.lang.Thread.run(Thread.java:748)
> 2020-12-15 15:42:01,578 
> [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] 
> INFO  server.RaftServer$Division (LeaderStateImpl.java:restart(497)) - 
> omNode-3@group-523986131536-LeaderStateImpl: Restarting GrpcLogAppender for 
> omNode-3@group-523986131536->omNode-2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to