[
https://issues.apache.org/jira/browse/RATIS-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608296#comment-17608296
]
Song Ziyang commented on RATIS-1674:
------------------------------------
Hi, we have new discoveries on this issue. In a recent run, this problem
({*}inconsistent AE{*}) reproduced again with {*}OutOfDirectMemoryError{*}.
After analyzing logs and code carefully, we highly suspect that current gRPC
Log Appender's *sending AppendEntries and updating NextIndex mechanism* can
lead to storming inconsistent AE & directOOM *under specific event sequence.*
DirectOOM often emerges when a *leader-switch* happens in a cluster containing
a {*}slow follower{*}. When newly elected leader send the first AE to this slow
follower, slow follower will reply inconsistent AE(this is expected), but
provide a {color:#de350b}*wrong NextIndex (smaller than actual
NextIndex)*{color}. The leader will take this wrong nextIndex and start
sending lots of AE(since waitTime=0), all of which are rejected as
"inconsistent AE" since NextIndex mismatches. This rejection in turn will cause
*more wrong AppendEntries and system will enter vicious cycle* and finally,
directOOM.
We then investigated on why this slow follower would provide a wrong NextIndex.
It turns out that when new leader's first AE arrives, *there are pending AE
from previous leader in the queue.* The slow follower reply to new leader its
current NextIndex, but {*}soon its current NextIndex will advance when these
pending AE are handled and added to the RaftLog{*}. By the time new leader's
second (and subsequent) AE arrive, the follower's actual NextIndex advances, so
these AE are "inconsistent".
To make matters worse, the above "Inconsistent AE" can cause the system enter a
*vicious cycle.* After Leader receives first AE reply from slow follower and
obtains the wrong NextIndex, it will send a flooding of AE (max 128) to the
slow follower (with 0 waiting time, refer to mayWait() condition in
GrpcLogAppender). ** These flooding AEs are all rejected as "Inconsistent". **
Since Inconsistent AE reply will {*}{color:#172b4d}force leader to set
NextIndex unconditionally{color}{*},{color:#de350b} *these rejected AEs will
cause leader set NextIndex to a wrong* *idempotent* *value over and over
again.* {color:#de350b}*Every time*{color} *a wrong nextIndex is set, more
inconsistent AEs will be sent and replied inconsistent, forcing leader update
NextIndex wrongly.*{color:#de350b}
{color}{color:#de350b}{color:#de350b}{color:#172b4d}It is a negative feedback
loop, causing the system directOOM in the end.{color}{color}{color}{color}
*A more detailed & quantified description on above event sequence is provided
at last.*
We then ask ourself whether this event sequence is a corner case with little
possibility, sadly the answer is no. Leader-switch happens time to time in a
large cluster, and long full-GC can lead to a follower lagging behind. When the
two condition combined, there will be a storm of inconsistent AE consuming up
network bandwidth, leading to cascading failures.
A more detailed & quantified description on above event sequence:
# Originally, a cluster [A B C] with A as leader, B as the normal follower and
C as the slow follower.
# C is hundrends of logs lagging behind. (This lag may be caused by CPU/memory
resource shortage or simply a full GC).
# Leader switchs from A to B. (A has a full gc and times out or crashes)
# B send the first AppendEntries to C, with nextIndex = B's commitIndex.
# C replies Inconsistent AE, and provide its current commit index n1 as
replyNextIndex.
([https://github.com/apache/ratis/blob/ab9fc535dc6f48ba1d93687fc46022989beb6b0a/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1585]).Notice,
*n1 is a wrong NextIndex* which does not include pending AEs.
# B receives reply, and start sending AppendEntries begining from n1.
According to the wait condition in
([https://github.com/apache/ratis/blob/ab9fc535dc6f48ba1d93687fc46022989beb6b0a/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java#L171]),
It can send up to maxPendingRequestsNum (default 128) AppendEntries RPCs with
0 waiting time. Suppose m1 RPCs are sent.
# However, the replyNextIndex n1 provided in step 5 is not correct. There are
pending appendEntries futures from the previous leader A not finished. See
([https://github.com/apache/ratis/blob/ab9fc535dc6f48ba1d93687fc46022989beb6b0a/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1516]).
When these futures finish, C's commit index will advance to n2. n2 > n1.
# Therefore, the m1 RPCs sent in 6 will all end up as "Inconsistent Entries",
as
([https://github.com/apache/ratis/blob/ab9fc535dc6f48ba1d93687fc46022989beb6b0a/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1575]).
C will send m1 replies to B, and these replies will lead to a storm.
# When the first (1/m1) Inconsistent AppendEntriesReply arrives in Leader B, B
will set nextIndex directly to n2
([https://github.com/apache/ratis/blob/ab9fc535dc6f48ba1d93687fc46022989beb6b0a/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java#L377]),
and continue sending AppendEntries from n2. Everything works fine until here.
# When the second (2/m1) Inconsistent AppendEntriesReply arrives in Leader B,
the nextIndex will be {*}reset to n2 again{*}, and B will {*}start sending AE
starting from n2{*}. Suppose m2 USELESS messages are sent this round.
# The step 10 will repeat m1 times, causing the nextIndex be reset to an
incorrect value over and over again. To make matters worse, roughly m1 * m2
useless AppendEntries are sent.
# These m1 * m2 messages will again end up as "Inconsistent Entries". Since
there's always a lag between replyNextIndex and actual nextIndex (counting
unfinished futures), the replyNextIndex returned by these m1 * m2 messages are
often incorrect.
# The step 8-12 repeats, with Inconsistent AppendEntries storming over the
network channel. According to the log, {color:#de350b}*roughly 2500
Inconsistent AEs are sent ad replied in about 60 seconds.*{color}
> Frequent inconsistency entries during normal operations
> -------------------------------------------------------
>
> Key: RATIS-1674
> URL: https://issues.apache.org/jira/browse/RATIS-1674
> Project: Ratis
> Issue Type: Bug
> Reporter: Song Ziyang
> Priority: Major
>
> *After grpc 1.48.1 release, we packaged ratis with master branch and
> thirdparty 1.0.2 and test whether out of direct memory issue is fixed.
> Unfortunately, this error is again reproduced.*
> *See the error stack:*
> 2022-08-16 15:28:45,962
> [172.20.70.24_40010@group-000100000001->172.20.70.23_40010-GrpcLogAppender-LogAppenderDaemon]
> ERROR o.a.r.s.l.LogAppenderDaemon:86 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010-GrpcLogAppender-LogAppenderDaemon
> failed
> org.apache.ratis.thirdparty.io.netty.util.internal.OutOfDirectMemoryError:
> failed to allocate 2097152 byte(s) of direct memory (used: 17177772311, max:
> 17179869184)
> at
> org.apache.ratis.thirdparty.io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:806)
> at
> org.apache.ratis.thirdparty.io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:735)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:649)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:624)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:203)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:187)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PoolArena.allocate(PoolArena.java:136)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PoolArena.allocate(PoolArena.java:126)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:396)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:124)
> at
> org.apache.ratis.thirdparty.io.grpc.netty.NettyWritableBufferAllocator.allocate(NettyWritableBufferAllocator.java:51)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer.writeRaw(MessageFramer.java:285)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer.access$400(MessageFramer.java:43)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer$OutputStreamAdapter.write(MessageFramer.java:375)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.write(CodedOutputStream.java:3005)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeLazy(CodedOutputStream.java:3013)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.ByteString$LiteralByteString.writeTo(ByteString.java:1411)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeBytesNoTag(CodedOutputStream.java:2801)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeBytes(CodedOutputStream.java:2775)
> at
> org.apache.ratis.proto.RaftProtos$StateMachineLogEntryProto.writeTo(RaftProtos.java:7869)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessageNoTag(CodedOutputStream.java:2855)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessage(CodedOutputStream.java:2824)
> at
> org.apache.ratis.proto.RaftProtos$LogEntryProto.writeTo(RaftProtos.java:9503)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessageNoTag(CodedOutputStream.java:2855)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessage(CodedOutputStream.java:2824)
> at
> org.apache.ratis.proto.RaftProtos$AppendEntriesRequestProto.writeTo(RaftProtos.java:17799)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:83)
> at
> org.apache.ratis.thirdparty.io.grpc.protobuf.lite.ProtoInputStream.drainTo(ProtoInputStream.java:52)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer.writeToOutputStream(MessageFramer.java:267)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer.writeKnownLengthUncompressed(MessageFramer.java:229)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:168)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:141)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:65)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:37)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.RetriableStream.sendMessage(RetriableStream.java:545)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:520)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:506)
> at
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:374)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender$StreamObservers.onNext(GrpcLogAppender.java:236)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.lambda$sendRequest$1(GrpcLogAppender.java:285)
> at java.util.Optional.map(Optional.java:215)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.sendRequest(GrpcLogAppender.java:284)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:276)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:160)
> at
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:77)
> at java.lang.Thread.run(Thread.java:748)
> 2022-08-16 15:28:45,963 [grpc-default-executor-2] WARN
> o.a.ratis.util.LogUtils:122 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010-AppendLogResponseHandler:
> Failed appendEntries:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: Client
> sendMessage() failed with Error
> 2022-08-16 15:28:45,963
> [172.20.70.24_40010@group-000100000001->172.20.70.23_40010-GrpcLogAppender-LogAppenderDaemon]
> INFO o.a.r.s.i.LeaderStateImpl:531 -
> 172.20.70.24_40010@group-000100000001-LeaderStateImpl: Restarting
> GrpcLogAppender for 172.20.70.24_40010@group-000100000001->172.20.70.23_40010
>
> {color:#ff0000}*However, we discover some other abnormalities.* {color}
> *That is, one follower often report that it receives inconsistent
> AppendEntries from leader as the first entry in AE already exists locally.
> After some complaining on "inconsistent appendEntry", this follower can still
> write logs to RaftLog from time to time. The "inconsistent AE" and "success
> log write" are interleaved. See follower log:*
> 2022-08-16 15:25:43,253
> [172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker] INFO
> o.a.r.s.r.s.SegmentedRaftLogWorker$StartLogSegment:629 -
> 172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker: created new log
> segment
> /data/cluster/first-rest-test/DataNode/apache-iotdb/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/current/log_inprogress_1376
> 2022-08-16 15:25:43,345 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1283) already exists (snapshotIndex:
> -1, commitIndex: 1393)
> 2022-08-16 15:25:43,345 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2960:FAIL-t1,INCONSISTENCY,nextIndex=1394,followerCommit=1393,matchIndex=-1
> 2022-08-16 15:25:43,358 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1287) already exists (snapshotIndex:
> -1, commitIndex: 1393)
> 2022-08-16 15:25:43,359 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2961:FAIL-t1,INCONSISTENCY,nextIndex=1394,followerCommit=1393,matchIndex=-1
> 2022-08-16 15:25:43,374 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1290) already exists (snapshotIndex:
> -1, commitIndex: 1393)
> 2022-08-16 15:25:43,374 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2962:FAIL-t1,INCONSISTENCY,nextIndex=1394,followerCommit=1393,matchIndex=-1
> 2022-08-16 15:25:43,387 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1294) already exists (snapshotIndex:
> -1, commitIndex: 1393)
> 2022-08-16 15:25:43,387 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2963:FAIL-t1,INCONSISTENCY,nextIndex=1394,followerCommit=1393,matchIndex=-1
> 2022-08-16 15:25:43,983 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1297) already exists (snapshotIndex:
> -1, commitIndex: 1393)
> 2022-08-16 15:25:43,984 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2964:FAIL-t1,INCONSISTENCY,nextIndex=1394,followerCommit=1393,matchIndex=-1
> {color:#0747a6}(successfully write after some inconsistency AE){color}
> {color:#0747a6}2022-08-16 15:25:44,011 [172.20.70.23_40010-server-thread2]
> INFO o.a.r.s.r.s.SegmentedRaftLogWorker:430 -
> 172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker: Rolling segment
> log-1376_1394 to index:1394{color}
> {color:#0747a6}2022-08-16 15:25:44,017
> [172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker] INFO
> o.a.r.s.r.s.SegmentedRaftLogWorker$FinalizeLogSegment:586 -
> 172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker: Rolled log
> segment from
> /data/cluster/first-rest-test/DataNode/apache-iotdb/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/current/log_inprogress_1376
> to
> /data/cluster/first-rest-test/DataNode/apache-iotdb/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/current/log_1376-1394{color}
> {color:#0747a6}2022-08-16 15:25:44,028
> [172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker] INFO
> o.a.r.s.r.s.SegmentedRaftLogWorker$StartLogSegment:629 -
> 172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker: created new log
> segment
> /data/cluster/first-rest-test/DataNode/apache-iotdb/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/current/log_inprogress_1395{color}
> 2022-08-16 15:25:44,137 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1394) already exists (snapshotIndex:
> -1, commitIndex: 1395)
> 2022-08-16 15:25:44,137 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2966:FAIL-t1,INCONSISTENCY,nextIndex=1396,followerCommit=1395,matchIndex=-1
> 2022-08-16 15:25:44,158 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1394) already exists (snapshotIndex:
> -1, commitIndex: 1395)
> 2022-08-16 15:25:44,158 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2967:FAIL-t1,INCONSISTENCY,nextIndex=1396,followerCommit=1395,matchIndex=-1
> 2022-08-16 15:25:44,229 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.r.s.SegmentedRaftLogWorker:430 -
> 172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker: Rolling segment
> log-1395_1411 to index:1411
> 2022-08-16 15:25:44,247 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1394) already exists (snapshotIndex:
> -1, commitIndex: 1403)
> 2022-08-16 15:25:44,248 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2980:FAIL-t1,INCONSISTENCY,nextIndex=1404,followerCommit=1403,matchIndex=-1
>
> *Meanwhile, the leader keeps reporting changing this follower's nextIndex*
> 2022-08-16 15:25:43,291
> [172.20.70.24_40010@group-000100000001-SegmentedRaftLogWorker] INFO
> o.a.r.s.r.s.SegmentedRaftLogWorker$FinalizeLogSegment:586 -
> 172.20.70.24_40010@group-000100000001-SegmentedRaftLogWorker: Rolled log
> segment from
> /data/cluster/first-rest-test/DataNode/apache-iotdb/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/current/log_inprogress_1430
> to
> /data/cluster/first-rest-test/DataNode/apache-iotdb/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/current/log_1430-1444
> 2022-08-16 15:25:43,826
> [org.apache.ratis.util.JvmPauseMonitor$$Lambda$168/788892554@6273c5a4] WARN
> o.a.r.u.JvmPauseMonitor:124 - JvmPauseMonitor-172.20.70.24_40010: Detected
> pause in JVM or host machine (eg GC): pause of approximately 163309780ns.
> GC pool 'PS Scavenge' had collection(s): count=1 time=431ms
> 2022-08-16 15:25:43,826
> [org.apache.ratis.util.JvmPauseMonitor$$Lambda$168/788892554@756cf158] WARN
> o.a.r.u.JvmPauseMonitor:124 - JvmPauseMonitor-172.20.70.24_50010: Detected
> pause in JVM or host machine (eg GC): pause of approximately 163346070ns.
> GC pool 'PS Scavenge' had collection(s): count=1 time=431ms
> 2022-08-16 15:25:43,876 [grpc-default-executor-0] INFO
> o.a.r.s.i.FollowerInfoImpl:48 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010: nextIndex:
> updateUnconditionally 1299 -> 1394
> 2022-08-16 15:25:44,014 [grpc-default-executor-0] INFO
> o.a.r.s.i.FollowerInfoImpl:48 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010: nextIndex:
> updateUnconditionally 1396 -> 1394
> 2022-08-16 15:25:44,118 [grpc-default-executor-0] INFO
> o.a.r.s.i.FollowerInfoImpl:48 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010: nextIndex:
> updateUnconditionally 1396 -> 1394
> 2022-08-16 15:25:44,129 [grpc-default-executor-0] INFO
> o.a.r.s.i.FollowerInfoImpl:48 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010: nextIndex:
> updateUnconditionally 1422 -> 1394
> 2022-08-16 15:25:44,130 [grpc-default-executor-0] INFO
> o.a.r.s.i.FollowerInfoImpl:48 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010: nextIndex:
> updateUnconditionally 1396 -> 1394
> 2022-08-16 15:25:44,141 [grpc-default-executor-0] INFO
> o.a.r.s.i.FollowerInfoImpl:48 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010: nextIndex:
> updateUnconditionally 1418 -> 1396
> *{color:#ff0000}I looked back at previous directOOM logs and discovered this
> similar log pattern. Since these "inconsistent AppendEntries" are sent from
> leader almost with 0 waiting time (according to follower reply timestamp), I
> guess directOOM may related to continuous Leader "inconsistent
> AppendEntries", which consumes lots of direct memory for network
> transimission.{color}*
> {*}Through, I'm confused why a follower will report inconsistent from time to
> time while still can successfully append logs between these "inconsistent
> AppendEntries"{*}{*}, why leader keeps sending “inconsistent AE" with zero
> waiting time{*}{*}. There is no obvious network congestion.{*}
> *FYI, I've set RaftLog forceFlush to false. {color:#172b4d}I'm not sure
> whether this async disk flush strategy is the cause for above
> abnormalities.{color}*
> *{color:#172b4d}I'll be grateful if anyone can help me with this issue.
> Thanks in advance!{color}*
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)