[
https://issues.apache.org/jira/browse/RATIS-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
guangbao zhao updated RATIS-2135:
---------------------------------
Description:
We found in the leader log of ratis that the follower replied leader nextIndex
= 645891323, and the leader also executed setNextIndex nextIndex:
updateUnconditionally 645892117 -> 645891323, but this process was repeated
continuously, causing the follower to become unavailable.
The following is the log of the leader node:
2024/08/05 23:59:05,943 [grpc-default-executor-98] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed (request=null,
errorCount=184); keep nextIndex (645892117) unchanged and retry.
2024/08/05 23:59:10,758 [grpc-default-executor-160] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
INCONSISTENCY reply with nextIndex 645891323, errorCount=185,
request=AppendEntriesRequest:cid=51646253,entriesCount=0
2024/08/05 23:59:10,758 [grpc-default-executor-160] INFO
org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
2024/08/05 23:59:10,943 [timer5] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed (request=null,
errorCount=186); keep nextIndex (645892117) unchanged and retry. (Repeated 2
times in the last 5.000s)
2024/08/05 23:59:10,943 [timer4] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed appendEntries
(Repeated 2 times in the last 5.000s):
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
RST_STREAM closed stream. HTTP/2 error code: CANCEL
2024/08/05 23:59:17,561 [grpc-default-executor-98] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
INCONSISTENCY reply with nextIndex 645891323, errorCount=187,
request=AppendEntriesRequest:cid=51646255,entriesCount=0
2024/08/05 23:59:17,561 [grpc-default-executor-98] INFO
org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
2024/08/05 23:59:17,670 [grpc-default-executor-98] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
RST_STREAM closed stream. HTTP/2 error code: CANCEL
2024/08/05 23:59:17,670 [grpc-default-executor-98] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed (request=null,
errorCount=188); keep nextIndex (645892117) unchanged and retry.
2024/08/05 23:59:21,426 [grpc-default-executor-160] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
INCONSISTENCY reply with nextIndex 645891323, errorCount=189,
request=AppendEntriesRequest:cid=51646257,entriesCount=0
2024/08/05 23:59:21,426 [grpc-default-executor-160] INFO
org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
was:
We found in the leader log of ratis that the follower replied leader nextIndex
= 645891323, and the leader also executed setNextIndex nextIndex:
updateUnconditionally 645892117 -> 645891323, but this process was repeated
continuously, causing the follower to become unavailable.
The following are relevant logs:
2024/08/05 23:59:05,943 [grpc-default-executor-98] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed (request=null,
errorCount=184); keep nextIndex (645892117) unchanged and retry.
2024/08/05 23:59:10,758 [grpc-default-executor-160] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
INCONSISTENCY reply with nextIndex 645891323, errorCount=185,
request=AppendEntriesRequest:cid=51646253,entriesCount=0
2024/08/05 23:59:10,758 [grpc-default-executor-160] INFO
org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
2024/08/05 23:59:10,943 [timer5] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed (request=null,
errorCount=186); keep nextIndex (645892117) unchanged and retry. (Repeated 2
times in the last 5.000s)
2024/08/05 23:59:10,943 [timer4] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed appendEntries
(Repeated 2 times in the last 5.000s):
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
RST_STREAM closed stream. HTTP/2 error code: CANCEL
2024/08/05 23:59:17,561 [grpc-default-executor-98] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
INCONSISTENCY reply with nextIndex 645891323, errorCount=187,
request=AppendEntriesRequest:cid=51646255,entriesCount=0
2024/08/05 23:59:17,561 [grpc-default-executor-98] INFO
org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
2024/08/05 23:59:17,670 [grpc-default-executor-98] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
RST_STREAM closed stream. HTTP/2 error code: CANCEL
2024/08/05 23:59:17,670 [grpc-default-executor-98] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed (request=null,
errorCount=188); keep nextIndex (645892117) unchanged and retry.
2024/08/05 23:59:21,426 [grpc-default-executor-160] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
INCONSISTENCY reply with nextIndex 645891323, errorCount=189,
request=AppendEntriesRequest:cid=51646257,entriesCount=0
2024/08/05 23:59:21,426 [grpc-default-executor-160] INFO
org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
> The leader keeps sending inconsistent entries repeatedly to followers.
> ----------------------------------------------------------------------
>
> Key: RATIS-2135
> URL: https://issues.apache.org/jira/browse/RATIS-2135
> Project: Ratis
> Issue Type: Bug
> Affects Versions: 3.0.1
> Reporter: guangbao zhao
> Priority: Major
>
> We found in the leader log of ratis that the follower replied leader
> nextIndex = 645891323, and the leader also executed setNextIndex nextIndex:
> updateUnconditionally 645892117 -> 645891323, but this process was repeated
> continuously, causing the follower to become unavailable.
> The following is the log of the leader node:
>
> 2024/08/05 23:59:05,943 [grpc-default-executor-98] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=184); keep nextIndex (645892117) unchanged and
> retry.
> 2024/08/05 23:59:10,758 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=185,
> request=AppendEntriesRequest:cid=51646253,entriesCount=0
> 2024/08/05 23:59:10,758 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
> 2024/08/05 23:59:10,943 [timer5] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=186); keep nextIndex (645892117) unchanged and
> retry. (Repeated 2 times in the last 5.000s)
> 2024/08/05 23:59:10,943 [timer4] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries (Repeated 2 times in the last 5.000s):
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
> RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/05 23:59:17,561 [grpc-default-executor-98] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=187,
> request=AppendEntriesRequest:cid=51646255,entriesCount=0
> 2024/08/05 23:59:17,561 [grpc-default-executor-98] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
> 2024/08/05 23:59:17,670 [grpc-default-executor-98] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/05 23:59:17,670 [grpc-default-executor-98] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=188); keep nextIndex (645892117) unchanged and
> retry.
> 2024/08/05 23:59:21,426 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=189,
> request=AppendEntriesRequest:cid=51646257,entriesCount=0
> 2024/08/05 23:59:21,426 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
--
This message was sent by Atlassian Jira
(v8.20.10#820010)