[
https://issues.apache.org/jira/browse/RATIS-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871733#comment-17871733
]
Tsz-wo Sze commented on RATIS-2135:
-----------------------------------
bq. ... when nextIndex is reset to 645891323, shouldn't the next log entry be
sent to the follower from 645891323?
The index 645891323 was the first index it failed.
{code}
2024/08/06 00:00:00,400 [grpc-default-executor-160] WARN
org.apache.ratis.grpc.server.GrpcLogAppender:
node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
INCONSISTENCY reply with nextIndex 645891323, errorCount=207,
request=AppendEntriesRequest:cid=51646275,entriesCount=744,entries=(t:626,
i:645892117)...(t:626, i:645892860)
{code}
There were multiple appendEntries calls. The error message above only printed
out that last call. Note that
- 645892117 - 645891323 = 794.
I guess the entry at 645891323 might have size > 32MB. So that entry kept
failing. Would you be able to check the size entry at 645891323 ? If it was
<= 32MB, please check also the other entries to see how did the size become
33555985.
bq. Why does 645892117 keep sending repeatedly for several hours?
The problem kept repeating. The Raft algorithm was to retry indefinitely.
> The leader keeps sending inconsistent entries repeatedly to followers.
> ----------------------------------------------------------------------
>
> Key: RATIS-2135
> URL: https://issues.apache.org/jira/browse/RATIS-2135
> Project: Ratis
> Issue Type: Bug
> Affects Versions: 3.0.1
> Reporter: guangbao zhao
> Priority: Major
>
> We found in the leader log of ratis that the follower replied leader
> nextIndex = 645891323, and the leader also executed setNextIndex nextIndex:
> updateUnconditionally 645892117 -> 645891323, but this process was repeated
> continuously, causing the follower to become unavailable.
> The following is the log of the leader node:
>
> 2024/08/06 00:00:00,094 [grpc-default-executor-210] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:00,103 [grpc-default-executor-210] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=206); keep nextIndex (645892117) unchanged and
> retry.
> 2024/08/06 00:00:00,400 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=207,
> request=AppendEntriesRequest:cid=51646275,entriesCount=744,entries=(t:626,
> i:645892117)...(t:626, i:645892860)
> 2024/08/06 00:00:00,400 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892861 -> 645891323
> 2024/08/06 00:00:02,888 [grpc-default-executor-210] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=209,
> request=AppendEntriesRequest:cid=51646277,entriesCount=669,entries=(t:626,
> i:645892117)...(t:626, i:645892785)
> 2024/08/06 00:00:02,888 [grpc-default-executor-210] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892786 -> 645891323
> 2024/08/06 00:00:03,344 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=211,
> request=AppendEntriesRequest:cid=51646279,entriesCount=744,entries=(t:626,
> i:645892117)...(t:626, i:645892860)
> 2024/08/06 00:00:03,344 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892861 -> 645891323
> 2024/08/06 00:00:05,103 [timer1] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries (Repeated 4 times in the last 5.009s):
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
> RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:05,104 [timer2] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=212); keep nextIndex (645892117) unchanged and
> retry. (Repeated 4 times in the last 5.001s)
> 2024/08/06 00:00:05,833 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=213,
> request=AppendEntriesRequest:cid=51646281,entriesCount=669,entries=(t:626,
> i:645892117)...(t:626, i:645892785)
> 2024/08/06 00:00:05,833 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892786 -> 645891323
> 2024/08/06 00:00:05,996 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:05,996 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=214); keep nextIndex (645892117) unchanged and
> retry.
> 2024/08/06 00:00:10,792 [grpc-default-executor-210] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=215,
> request=AppendEntriesRequest:cid=51646283,entriesCount=0
> 2024/08/06 00:00:10,792 [grpc-default-executor-210] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
> 2024/08/06 00:00:10,996 [timer5] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=216); keep nextIndex (645892117) unchanged and
> retry. (Repeated 2 times in the last 5.000s)
> 2024/08/06 00:00:10,996 [timer4] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries (Repeated 2 times in the last 5.000s):
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
> RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:15,014 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=217,
> request=AppendEntriesRequest:cid=51646285,entriesCount=0
> 2024/08/06 00:00:15,014 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
--
This message was sent by Atlassian Jira
(v8.20.10#820010)