[ https://issues.apache.org/jira/browse/RATIS-2135 ]
guangbao zhao deleted comment on RATIS-2135:
--------------------------------------
was (Author: z-bb):
[~szetszwo] when nextIndex is reset to 645891323, shouldn't the next log entry
be sent to the follower from 645891323? Why does it keep sending from 645892117
repeatedly for several hours, and then resumes itself. The following is the log
before recovery:
{code:java}
2024/08/06 00:03:56,384 [timer5] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed appendEntries
(Repeated 2 times in the last 5.000s):
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
RST_STREAM closed stream. HTTP/2 error code: CANCEL
2024/08/06 00:03:56,384 [timer6] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed (request=null,
errorCount=325); keep nextIndex (645892117) unchanged and retry. (Repeated 2
times in the last 5.000s)
2024/08/06 00:04:00,032 [grpc-default-executor-98] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
INCONSISTENCY reply with nextIndex 645891323, errorCount=326,
request=AppendEntriesRequest:cid=51646394,entriesCount=0
2024/08/06 00:04:00,163 [grpc-default-executor-98] INFO
org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
setNextIndex nextIndex: updateUnconditionally 645892861 -> 645891323
2024/08/06 00:04:00,268 [grpc-default-executor-98] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
INCONSISTENCY reply with nextIndex 645891323, errorCount=327, request=null
2024/08/06 00:04:00,333 [grpc-default-executor-98] INFO
org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
2024/08/06 00:04:00,343 [grpc-default-executor-98] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
RST_STREAM closed stream. HTTP/2 error code: CANCEL
2024/08/06 00:04:00,343 [grpc-default-executor-98] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed (request=null,
errorCount=328); keep nextIndex (645891323) unchanged and retry.{code}
There was also an RST_STREAM closed stream exception during the period, but in
the end keep nextIndex (645891323) unchanged and retry. This situation can
sometimes be recovered by itself. Sometimes it cannot be recovered by itself,
and restarting the service is useless.
> The leader keeps sending inconsistent entries repeatedly to followers.
> ----------------------------------------------------------------------
>
> Key: RATIS-2135
> URL: https://issues.apache.org/jira/browse/RATIS-2135
> Project: Ratis
> Issue Type: Bug
> Affects Versions: 3.0.1
> Reporter: guangbao zhao
> Priority: Major
>
> We found in the leader log of ratis that the follower replied leader
> nextIndex = 645891323, and the leader also executed setNextIndex nextIndex:
> updateUnconditionally 645892117 -> 645891323, but this process was repeated
> continuously, causing the follower to become unavailable.
> The following is the log of the leader node:
>
> 2024/08/06 00:00:00,094 [grpc-default-executor-210] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:00,103 [grpc-default-executor-210] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=206); keep nextIndex (645892117) unchanged and
> retry.
> 2024/08/06 00:00:00,400 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=207,
> request=AppendEntriesRequest:cid=51646275,entriesCount=744,entries=(t:626,
> i:645892117)...(t:626, i:645892860)
> 2024/08/06 00:00:00,400 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892861 -> 645891323
> 2024/08/06 00:00:02,888 [grpc-default-executor-210] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=209,
> request=AppendEntriesRequest:cid=51646277,entriesCount=669,entries=(t:626,
> i:645892117)...(t:626, i:645892785)
> 2024/08/06 00:00:02,888 [grpc-default-executor-210] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892786 -> 645891323
> 2024/08/06 00:00:03,344 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=211,
> request=AppendEntriesRequest:cid=51646279,entriesCount=744,entries=(t:626,
> i:645892117)...(t:626, i:645892860)
> 2024/08/06 00:00:03,344 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892861 -> 645891323
> 2024/08/06 00:00:05,103 [timer1] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries (Repeated 4 times in the last 5.009s):
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
> RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:05,104 [timer2] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=212); keep nextIndex (645892117) unchanged and
> retry. (Repeated 4 times in the last 5.001s)
> 2024/08/06 00:00:05,833 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=213,
> request=AppendEntriesRequest:cid=51646281,entriesCount=669,entries=(t:626,
> i:645892117)...(t:626, i:645892785)
> 2024/08/06 00:00:05,833 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892786 -> 645891323
> 2024/08/06 00:00:05,996 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:05,996 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=214); keep nextIndex (645892117) unchanged and
> retry.
> 2024/08/06 00:00:10,792 [grpc-default-executor-210] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=215,
> request=AppendEntriesRequest:cid=51646283,entriesCount=0
> 2024/08/06 00:00:10,792 [grpc-default-executor-210] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
> 2024/08/06 00:00:10,996 [timer5] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=216); keep nextIndex (645892117) unchanged and
> retry. (Repeated 2 times in the last 5.000s)
> 2024/08/06 00:00:10,996 [timer4] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries (Repeated 2 times in the last 5.000s):
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
> RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:15,014 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=217,
> request=AppendEntriesRequest:cid=51646285,entriesCount=0
> 2024/08/06 00:00:15,014 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
--
This message was sent by Atlassian Jira
(v8.20.10#820010)