[
https://issues.apache.org/jira/browse/RATIS-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz-wo Sze updated RATIS-1674:
------------------------------
Attachment: RATIS-1674_idea.patch
> Frequent inconsistency entries during normal operations
> -------------------------------------------------------
>
> Key: RATIS-1674
> URL: https://issues.apache.org/jira/browse/RATIS-1674
> Project: Ratis
> Issue Type: Bug
> Reporter: Song Ziyang
> Priority: Major
> Attachments: RATIS-1674_idea.patch
>
>
> *After grpc 1.48.1 release, we packaged ratis with master branch and
> thirdparty 1.0.2 and test whether out of direct memory issue is fixed.
> Unfortunately, this error is again reproduced.*
> *See the error stack:*
> 2022-08-16 15:28:45,962
> [172.20.70.24_40010@group-000100000001->172.20.70.23_40010-GrpcLogAppender-LogAppenderDaemon]
> ERROR o.a.r.s.l.LogAppenderDaemon:86 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010-GrpcLogAppender-LogAppenderDaemon
> failed
> org.apache.ratis.thirdparty.io.netty.util.internal.OutOfDirectMemoryError:
> failed to allocate 2097152 byte(s) of direct memory (used: 17177772311, max:
> 17179869184)
> at
> org.apache.ratis.thirdparty.io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:806)
> at
> org.apache.ratis.thirdparty.io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:735)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:649)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:624)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:203)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:187)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PoolArena.allocate(PoolArena.java:136)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PoolArena.allocate(PoolArena.java:126)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:396)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
> at
> org.apache.ratis.thirdparty.io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:124)
> at
> org.apache.ratis.thirdparty.io.grpc.netty.NettyWritableBufferAllocator.allocate(NettyWritableBufferAllocator.java:51)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer.writeRaw(MessageFramer.java:285)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer.access$400(MessageFramer.java:43)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer$OutputStreamAdapter.write(MessageFramer.java:375)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.write(CodedOutputStream.java:3005)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeLazy(CodedOutputStream.java:3013)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.ByteString$LiteralByteString.writeTo(ByteString.java:1411)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeBytesNoTag(CodedOutputStream.java:2801)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeBytes(CodedOutputStream.java:2775)
> at
> org.apache.ratis.proto.RaftProtos$StateMachineLogEntryProto.writeTo(RaftProtos.java:7869)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessageNoTag(CodedOutputStream.java:2855)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessage(CodedOutputStream.java:2824)
> at
> org.apache.ratis.proto.RaftProtos$LogEntryProto.writeTo(RaftProtos.java:9503)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessageNoTag(CodedOutputStream.java:2855)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessage(CodedOutputStream.java:2824)
> at
> org.apache.ratis.proto.RaftProtos$AppendEntriesRequestProto.writeTo(RaftProtos.java:17799)
> at
> org.apache.ratis.thirdparty.com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:83)
> at
> org.apache.ratis.thirdparty.io.grpc.protobuf.lite.ProtoInputStream.drainTo(ProtoInputStream.java:52)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer.writeToOutputStream(MessageFramer.java:267)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer.writeKnownLengthUncompressed(MessageFramer.java:229)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:168)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:141)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:65)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:37)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.RetriableStream.sendMessage(RetriableStream.java:545)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:520)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:506)
> at
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:374)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender$StreamObservers.onNext(GrpcLogAppender.java:236)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.lambda$sendRequest$1(GrpcLogAppender.java:285)
> at java.util.Optional.map(Optional.java:215)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.sendRequest(GrpcLogAppender.java:284)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:276)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:160)
> at
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:77)
> at java.lang.Thread.run(Thread.java:748)
> 2022-08-16 15:28:45,963 [grpc-default-executor-2] WARN
> o.a.ratis.util.LogUtils:122 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010-AppendLogResponseHandler:
> Failed appendEntries:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: Client
> sendMessage() failed with Error
> 2022-08-16 15:28:45,963
> [172.20.70.24_40010@group-000100000001->172.20.70.23_40010-GrpcLogAppender-LogAppenderDaemon]
> INFO o.a.r.s.i.LeaderStateImpl:531 -
> 172.20.70.24_40010@group-000100000001-LeaderStateImpl: Restarting
> GrpcLogAppender for 172.20.70.24_40010@group-000100000001->172.20.70.23_40010
>
> {color:#ff0000}*However, we discover some other abnormalities.* {color}
> *That is, one follower often report that it receives inconsistent
> AppendEntries from leader as the first entry in AE already exists locally.
> After some complaining on "inconsistent appendEntry", this follower can still
> write logs to RaftLog from time to time. The "inconsistent AE" and "success
> log write" are interleaved. See follower log:*
> 2022-08-16 15:25:43,253
> [172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker] INFO
> o.a.r.s.r.s.SegmentedRaftLogWorker$StartLogSegment:629 -
> 172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker: created new log
> segment
> /data/cluster/first-rest-test/DataNode/apache-iotdb/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/current/log_inprogress_1376
> 2022-08-16 15:25:43,345 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1283) already exists (snapshotIndex:
> -1, commitIndex: 1393)
> 2022-08-16 15:25:43,345 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2960:FAIL-t1,INCONSISTENCY,nextIndex=1394,followerCommit=1393,matchIndex=-1
> 2022-08-16 15:25:43,358 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1287) already exists (snapshotIndex:
> -1, commitIndex: 1393)
> 2022-08-16 15:25:43,359 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2961:FAIL-t1,INCONSISTENCY,nextIndex=1394,followerCommit=1393,matchIndex=-1
> 2022-08-16 15:25:43,374 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1290) already exists (snapshotIndex:
> -1, commitIndex: 1393)
> 2022-08-16 15:25:43,374 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2962:FAIL-t1,INCONSISTENCY,nextIndex=1394,followerCommit=1393,matchIndex=-1
> 2022-08-16 15:25:43,387 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1294) already exists (snapshotIndex:
> -1, commitIndex: 1393)
> 2022-08-16 15:25:43,387 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2963:FAIL-t1,INCONSISTENCY,nextIndex=1394,followerCommit=1393,matchIndex=-1
> 2022-08-16 15:25:43,983 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1297) already exists (snapshotIndex:
> -1, commitIndex: 1393)
> 2022-08-16 15:25:43,984 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2964:FAIL-t1,INCONSISTENCY,nextIndex=1394,followerCommit=1393,matchIndex=-1
> {color:#0747a6}(successfully write after some inconsistency AE){color}
> {color:#0747a6}2022-08-16 15:25:44,011 [172.20.70.23_40010-server-thread2]
> INFO o.a.r.s.r.s.SegmentedRaftLogWorker:430 -
> 172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker: Rolling segment
> log-1376_1394 to index:1394{color}
> {color:#0747a6}2022-08-16 15:25:44,017
> [172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker] INFO
> o.a.r.s.r.s.SegmentedRaftLogWorker$FinalizeLogSegment:586 -
> 172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker: Rolled log
> segment from
> /data/cluster/first-rest-test/DataNode/apache-iotdb/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/current/log_inprogress_1376
> to
> /data/cluster/first-rest-test/DataNode/apache-iotdb/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/current/log_1376-1394{color}
> {color:#0747a6}2022-08-16 15:25:44,028
> [172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker] INFO
> o.a.r.s.r.s.SegmentedRaftLogWorker$StartLogSegment:629 -
> 172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker: created new log
> segment
> /data/cluster/first-rest-test/DataNode/apache-iotdb/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/current/log_inprogress_1395{color}
> 2022-08-16 15:25:44,137 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1394) already exists (snapshotIndex:
> -1, commitIndex: 1395)
> 2022-08-16 15:25:44,137 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2966:FAIL-t1,INCONSISTENCY,nextIndex=1396,followerCommit=1395,matchIndex=-1
> 2022-08-16 15:25:44,158 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1394) already exists (snapshotIndex:
> -1, commitIndex: 1395)
> 2022-08-16 15:25:44,158 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2967:FAIL-t1,INCONSISTENCY,nextIndex=1396,followerCommit=1395,matchIndex=-1
> 2022-08-16 15:25:44,229 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.r.s.SegmentedRaftLogWorker:430 -
> 172.20.70.23_40010@group-000100000001-SegmentedRaftLogWorker: Rolling segment
> log-1395_1411 to index:1411
> 2022-08-16 15:25:44,247 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1475 - 172.20.70.23_40010@group-000100000001: Failed
> appendEntries as the first entry (index 1394) already exists (snapshotIndex:
> -1, commitIndex: 1403)
> 2022-08-16 15:25:44,248 [172.20.70.23_40010-server-thread2] INFO
> o.a.r.s.i.RaftServerImpl:1455 - 172.20.70.23_40010@group-000100000001:
> inconsistency entries.
> Reply:172.20.70.24_40010<-172.20.70.23_40010#2980:FAIL-t1,INCONSISTENCY,nextIndex=1404,followerCommit=1403,matchIndex=-1
>
> *Meanwhile, the leader keeps reporting changing this follower's nextIndex*
> 2022-08-16 15:25:43,291
> [172.20.70.24_40010@group-000100000001-SegmentedRaftLogWorker] INFO
> o.a.r.s.r.s.SegmentedRaftLogWorker$FinalizeLogSegment:586 -
> 172.20.70.24_40010@group-000100000001-SegmentedRaftLogWorker: Rolled log
> segment from
> /data/cluster/first-rest-test/DataNode/apache-iotdb/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/current/log_inprogress_1430
> to
> /data/cluster/first-rest-test/DataNode/apache-iotdb/datanode/sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/current/log_1430-1444
> 2022-08-16 15:25:43,826
> [org.apache.ratis.util.JvmPauseMonitor$$Lambda$168/788892554@6273c5a4] WARN
> o.a.r.u.JvmPauseMonitor:124 - JvmPauseMonitor-172.20.70.24_40010: Detected
> pause in JVM or host machine (eg GC): pause of approximately 163309780ns.
> GC pool 'PS Scavenge' had collection(s): count=1 time=431ms
> 2022-08-16 15:25:43,826
> [org.apache.ratis.util.JvmPauseMonitor$$Lambda$168/788892554@756cf158] WARN
> o.a.r.u.JvmPauseMonitor:124 - JvmPauseMonitor-172.20.70.24_50010: Detected
> pause in JVM or host machine (eg GC): pause of approximately 163346070ns.
> GC pool 'PS Scavenge' had collection(s): count=1 time=431ms
> 2022-08-16 15:25:43,876 [grpc-default-executor-0] INFO
> o.a.r.s.i.FollowerInfoImpl:48 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010: nextIndex:
> updateUnconditionally 1299 -> 1394
> 2022-08-16 15:25:44,014 [grpc-default-executor-0] INFO
> o.a.r.s.i.FollowerInfoImpl:48 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010: nextIndex:
> updateUnconditionally 1396 -> 1394
> 2022-08-16 15:25:44,118 [grpc-default-executor-0] INFO
> o.a.r.s.i.FollowerInfoImpl:48 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010: nextIndex:
> updateUnconditionally 1396 -> 1394
> 2022-08-16 15:25:44,129 [grpc-default-executor-0] INFO
> o.a.r.s.i.FollowerInfoImpl:48 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010: nextIndex:
> updateUnconditionally 1422 -> 1394
> 2022-08-16 15:25:44,130 [grpc-default-executor-0] INFO
> o.a.r.s.i.FollowerInfoImpl:48 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010: nextIndex:
> updateUnconditionally 1396 -> 1394
> 2022-08-16 15:25:44,141 [grpc-default-executor-0] INFO
> o.a.r.s.i.FollowerInfoImpl:48 -
> 172.20.70.24_40010@group-000100000001->172.20.70.23_40010: nextIndex:
> updateUnconditionally 1418 -> 1396
> *{color:#ff0000}I looked back at previous directOOM logs and discovered this
> similar log pattern. Since these "inconsistent AppendEntries" are sent from
> leader almost with 0 waiting time (according to follower reply timestamp), I
> guess directOOM may related to continuous Leader "inconsistent
> AppendEntries", which consumes lots of direct memory for network
> transimission.{color}*
> {*}Through, I'm confused why a follower will report inconsistent from time to
> time while still can successfully append logs between these "inconsistent
> AppendEntries"{*}{*}, why leader keeps sending “inconsistent AE" with zero
> waiting time{*}{*}. There is no obvious network congestion.{*}
> *FYI, I've set RaftLog forceFlush to false. {color:#172b4d}I'm not sure
> whether this async disk flush strategy is the cause for above
> abnormalities.{color}*
> *{color:#172b4d}I'll be grateful if anyone can help me with this issue.
> Thanks in advance!{color}*
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)