[
https://issues.apache.org/jira/browse/RATIS-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871552#comment-17871552
]
guangbao zhao commented on RATIS-2135:
--------------------------------------
hi [~szetszwo] I found a log like this:
{code:java}
Aug 05, 2024 8:22:03 PM
org.apache.ratis.thirdparty.io.grpc.netty.NettyServerStream$TransportState
deframeFailed
WARNING: Exception processing message
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED:
gRPC message exceeds maximum size 33554432: 33555985
at
org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:529)
at
org.apache.ratis.thirdparty.io.grpc.internal.MessageDeframer.processHeader(MessageDeframer.java:392)
at
org.apache.ratis.thirdparty.io.grpc.internal.MessageDeframer.deliver(MessageDeframer.java:272)
at
org.apache.ratis.thirdparty.io.grpc.internal.MessageDeframer.deframe(MessageDeframer.java:178)
at
org.apache.ratis.thirdparty.io.grpc.internal.AbstractStream$TransportState.deframe(AbstractStream.java:211)
at
org.apache.ratis.thirdparty.io.grpc.internal.AbstractServerStream$TransportState.inboundDataReceived(AbstractServerStream.java:262)
at
org.apache.ratis.thirdparty.io.grpc.netty.NettyServerStream$TransportState.inboundDataReceived(NettyServerStream.java:210)
at
org.apache.ratis.thirdparty.io.grpc.netty.NettyServerHandler.onDataRead(NettyServerHandler.java:520)
at
org.apache.ratis.thirdparty.io.grpc.netty.NettyServerHandler.access$900(NettyServerHandler.java:111)
at
org.apache.ratis.thirdparty.io.grpc.netty.NettyServerHandler$FrameListener.onDataRead(NettyServerHandler.java:840)
at
org.apache.ratis.thirdparty.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:307)
at
org.apache.ratis.thirdparty.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48)
at
org.apache.ratis.thirdparty.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readDataFrame(DefaultHttp2FrameReader.java:415)
at
org.apache.ratis.thirdparty.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:250)
at
org.apache.ratis.thirdparty.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159)
at
org.apache.ratis.thirdparty.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
at
org.apache.ratis.thirdparty.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173)
at
org.apache.ratis.thirdparty.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:393)
at
org.apache.ratis.thirdparty.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:453)
at
org.apache.ratis.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
at
org.apache.ratis.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
at
org.apache.ratis.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
at
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at
org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at
org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
at
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:509)
at
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at
org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748) {code}
This should be the reason why the appendEntries call keeps failing.
StatusRuntimeException: CANCELLED: RST_STREAM closed the stream. HTTP/2 error
code: CANCEL, but I still don't understand why when nextIndex is reset to
645891323, shouldn't the next log entry be sent to the follower from 645891323?
Why does 645892117 keep sending repeatedly for several hours?
> The leader keeps sending inconsistent entries repeatedly to followers.
> ----------------------------------------------------------------------
>
> Key: RATIS-2135
> URL: https://issues.apache.org/jira/browse/RATIS-2135
> Project: Ratis
> Issue Type: Bug
> Affects Versions: 3.0.1
> Reporter: guangbao zhao
> Priority: Major
>
> We found in the leader log of ratis that the follower replied leader
> nextIndex = 645891323, and the leader also executed setNextIndex nextIndex:
> updateUnconditionally 645892117 -> 645891323, but this process was repeated
> continuously, causing the follower to become unavailable.
> The following is the log of the leader node:
>
> 2024/08/06 00:00:00,094 [grpc-default-executor-210] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:00,103 [grpc-default-executor-210] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=206); keep nextIndex (645892117) unchanged and
> retry.
> 2024/08/06 00:00:00,400 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=207,
> request=AppendEntriesRequest:cid=51646275,entriesCount=744,entries=(t:626,
> i:645892117)...(t:626, i:645892860)
> 2024/08/06 00:00:00,400 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892861 -> 645891323
> 2024/08/06 00:00:02,888 [grpc-default-executor-210] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=209,
> request=AppendEntriesRequest:cid=51646277,entriesCount=669,entries=(t:626,
> i:645892117)...(t:626, i:645892785)
> 2024/08/06 00:00:02,888 [grpc-default-executor-210] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892786 -> 645891323
> 2024/08/06 00:00:03,344 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=211,
> request=AppendEntriesRequest:cid=51646279,entriesCount=744,entries=(t:626,
> i:645892117)...(t:626, i:645892860)
> 2024/08/06 00:00:03,344 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892861 -> 645891323
> 2024/08/06 00:00:05,103 [timer1] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries (Repeated 4 times in the last 5.009s):
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
> RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:05,104 [timer2] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=212); keep nextIndex (645892117) unchanged and
> retry. (Repeated 4 times in the last 5.001s)
> 2024/08/06 00:00:05,833 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=213,
> request=AppendEntriesRequest:cid=51646281,entriesCount=669,entries=(t:626,
> i:645892117)...(t:626, i:645892785)
> 2024/08/06 00:00:05,833 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892786 -> 645891323
> 2024/08/06 00:00:05,996 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:05,996 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=214); keep nextIndex (645892117) unchanged and
> retry.
> 2024/08/06 00:00:10,792 [grpc-default-executor-210] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=215,
> request=AppendEntriesRequest:cid=51646283,entriesCount=0
> 2024/08/06 00:00:10,792 [grpc-default-executor-210] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
> 2024/08/06 00:00:10,996 [timer5] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-GrpcLogAppender: Follower failed
> (request=null, errorCount=216); keep nextIndex (645892117) unchanged and
> retry. (Repeated 2 times in the last 5.000s)
> 2024/08/06 00:00:10,996 [timer4] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: Failed
> appendEntries (Repeated 2 times in the last 5.000s):
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
> RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2024/08/06 00:00:15,014 [grpc-default-executor-160] WARN
> org.apache.ratis.grpc.server.GrpcLogAppender:
> node2@group-EE11FFDD70A4->node1-AppendLogResponseHandler: received
> INCONSISTENCY reply with nextIndex 645891323, errorCount=217,
> request=AppendEntriesRequest:cid=51646285,entriesCount=0
> 2024/08/06 00:00:15,014 [grpc-default-executor-160] INFO
> org.apache.ratis.server.leader.FollowerInfo: node2@group-EE11FFDD70A4->node1:
> setNextIndex nextIndex: updateUnconditionally 645892117 -> 645891323
--
This message was sent by Atlassian Jira
(v8.20.10#820010)