[ 
https://issues.apache.org/jira/browse/RATIS-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872421#comment-17872421
 ] 

Tsz-wo Sze commented on RATIS-2135:
-----------------------------------

[~z-bb], it is really a bug. In your case,
 - LogAppenderBase.buffer size was 32MB 
(raft.server.log.appender.buffer.byte-limit)
 - gRPC maxInboundMessageSize was 32MB (raft.grpc.message.size.max)

When the buffer has 32MB data, the size of the appendEntriesProto built will be 
> 32MB.
{code:java}
//LogAppenderBase.nextAppendEntriesRequest(..)
    final List<LogEntryProto> protos;
    try {
      protos = buffer.pollList(..);                     // assume protos size 
is 32MB
    } catch (RaftLogIOException e) {
      ...
    } finally {
      ...
    }
    ...
    AppendEntriesRequestProto appendEntriesProto =      // BUG: protos + other 
info > 32MB
        leaderState.newAppendEntriesRequestProto(follower, protos, previous, 
callId);
    return ..;
{code}

As a work around, we may configure
- raft.grpc.message.size.max

to be slight larger than
- raft.server.log.appender.buffer.byte-limit,

say 1MB.

> The leader keeps sending inconsistent entries repeatedly to followers.
> ----------------------------------------------------------------------
>
>                 Key: RATIS-2135
>                 URL: https://issues.apache.org/jira/browse/RATIS-2135
>             Project: Ratis
>          Issue Type: Bug
>    Affects Versions: 3.0.1
>            Reporter: guangbao zhao
>            Priority: Major
>
> We found in the leader log of ratis that the follower replied leader 
> nextIndex = 645891323, and the leader also executed setNextIndex nextIndex: 
> updateUnconditionally 645892117 -> 645891323, but this process was repeated 
> continuously, causing the follower to become unavailable.
> The following is the log of the leader node:
>  
> 2024/08/06 00:00:00,094 [grpc-default-executor-210] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:00,103 [grpc-default-executor-210] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed 
> (request=null, errorCount=206); keep nextIndex (645892117) unchanged and 
> retry.
> 2024/08/06 00:00:00,400 [grpc-default-executor-160] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received 
> INCONSISTENCY reply with nextIndex 645891323, errorCount=207, 
> request=AppendEntriesRequest:cid=51646275,entriesCount=744,entries=(t:626, 
> i:645892117)...(t:626, i:645892860)
> 2024/08/06 00:00:00,400 [grpc-default-executor-160] INFO 
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1: 
> setNextIndex nextIndex: updateUnconditionally 645892861 -> 645891323
> 2024/08/06 00:00:02,888 [grpc-default-executor-210] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received 
> INCONSISTENCY reply with nextIndex 645891323, errorCount=209, 
> request=AppendEntriesRequest:cid=51646277,entriesCount=669,entries=(t:626, 
> i:645892117)...(t:626, i:645892785)
> 2024/08/06 00:00:02,888 [grpc-default-executor-210] INFO 
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1: 
> setNextIndex nextIndex: updateUnconditionally 645892786 -> 645891323
> 2024/08/06 00:00:03,344 [grpc-default-executor-160] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received 
> INCONSISTENCY reply with nextIndex 645891323, errorCount=211, 
> request=AppendEntriesRequest:cid=51646279,entriesCount=744,entries=(t:626, 
> i:645892117)...(t:626, i:645892860)
> 2024/08/06 00:00:03,344 [grpc-default-executor-160] INFO 
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1: 
> setNextIndex nextIndex: updateUnconditionally 645892861 -> 645891323
> 2024/08/06 00:00:05,103 [timer1] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed 
> appendEntries (Repeated 4 times in the last 5.009s): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
> RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:05,104 [timer2] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed 
> (request=null, errorCount=212); keep nextIndex (645892117) unchanged and 
> retry. (Repeated 4 times in the last 5.001s)
> 2024/08/06 00:00:05,833 [grpc-default-executor-160] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received 
> INCONSISTENCY reply with nextIndex 645891323, errorCount=213, 
> request=AppendEntriesRequest:cid=51646281,entriesCount=669,entries=(t:626, 
> i:645892117)...(t:626, i:645892785)
> 2024/08/06 00:00:05,833 [grpc-default-executor-160] INFO 
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1: 
> setNextIndex nextIndex: updateUnconditionally 645892786 -> 645891323
> 2024/08/06 00:00:05,996 [grpc-default-executor-160] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:05,996 [grpc-default-executor-160] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed 
> (request=null, errorCount=214); keep nextIndex (645892117) unchanged and 
> retry.
> 2024/08/06 00:00:10,792 [grpc-default-executor-210] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received 
> INCONSISTENCY reply with nextIndex 645891323, errorCount=215, 
> request=AppendEntriesRequest:cid=51646283,entriesCount=0
> 2024/08/06 00:00:10,792 [grpc-default-executor-210] INFO 
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1: 
> setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
> 2024/08/06 00:00:10,996 [timer5] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed 
> (request=null, errorCount=216); keep nextIndex (645892117) unchanged and 
> retry. (Repeated 2 times in the last 5.000s)
> 2024/08/06 00:00:10,996 [timer4] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed 
> appendEntries (Repeated 2 times in the last 5.000s): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
> RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:15,014 [grpc-default-executor-160] WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received 
> INCONSISTENCY reply with nextIndex 645891323, errorCount=217, 
> request=AppendEntriesRequest:cid=51646285,entriesCount=0
> 2024/08/06 00:00:15,014 [grpc-default-executor-160] INFO 
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1: 
> setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to