Xinyu Tan created RATIS-2203:
--------------------------------
Summary: Fixed issue where Ratis member changes are stuck
Key: RATIS-2203
URL: https://issues.apache.org/jira/browse/RATIS-2203
Project: Ratis
Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan
Currently, in IoTDB, we use ratisclient to submit membership change requests,
and rely on the
[retrypolicy|https://github.com/apache/iotdb/blob/master/iotdb-core/consensus/src/main/java/org/apache/iotdb/consensus/ratis/RatisClient.java#L236-L249]
to ensure that membership change requests with longer durations can eventually
be successfully completed.
In a recent test, we discovered that Ratis might get stuck during a membership
change. After a series of troubleshooting steps, we have mostly determined that
this is a bug in the Ratis core.
Now, I will list the logs of the leader and the new replica to see if we can
analyze the cause of the error together.
leader
{code:java}
2024-11-29 11:50:49,166 [19-server-thread1] INFO o.a.r.s.i.RaftServerImpl:1296
- 19@group-000200000061: receive setConfiguration
SetConfigurationRequest:client-5FD616FE3614->19@group-000200000061, cid=1,
seq=null, RW, null, SET_UNCONDITIONALLY, servers:[4|172.16.2.3:10750,
19|172.16.2.18:10750], listeners:[]
2024-11-29 11:50:49,168 [19-server-thread1] INFO o.a.r.s.i.LeaderStateImpl:485
- 19@group-000200000061-LeaderStateImpl: startSetConfiguration
SetConfigurationRequest:client-5FD616FE3614->19@group-000200000061, cid=1,
seq=null, RW, null, SET_UNCONDITIONALLY, servers:[4|172.16.2.3:10750,
19|172.16.2.18:10750], listeners:[]
2024-11-29 11:50:49,208
[19@group-000200000061->4-GrpcLogAppender-LogAppenderDaemon] INFO
o.a.r.s.i.FollowerInfoImpl:154 - Follower 19@group-000200000061->4 acknowledged
installing snapshot
2024-11-29 11:50:49,215
[19@group-000200000061->4-GrpcLogAppender-LogAppenderDaemon] INFO
o.a.r.g.s.GrpcServerProtocolClient:65 - Build channel for 4|172.16.2.3:10750
2024-11-29 11:50:59,412 [timer2] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out appendEntries,
errorCount=1,
request=AppendEntriesRequest:cid=1,entriesCount=9198,entries=(t:1, i:0)...(t:1,
i:9197)
2024-11-29 11:50:59,732 [timer3] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=2, request=AppendEntriesRequest:cid=2,entriesCount=0
2024-11-29 11:51:00,238 [timer4] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=3, request=AppendEntriesRequest:cid=3,entriesCount=0
2024-11-29 11:51:00,740 [timer5] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=4, request=AppendEntriesRequest:cid=4,entriesCount=0
2024-11-29 11:51:01,906 [timer6] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=5, request=AppendEntriesRequest:cid=6,entriesCount=0
2024-11-29 11:51:02,406 [timer7] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=6, request=AppendEntriesRequest:cid=7,entriesCount=0
2024-11-29 11:51:02,908 [timer0] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=7, request=AppendEntriesRequest:cid=8,entriesCount=0
2024-11-29 11:51:03,086 [19-server-thread1] INFO o.a.r.s.i.RaftServerImpl:1296
- 19@group-000200000061: receive setConfiguration
SetConfigurationRequest:client-5FD616FE3614->19@group-000200000061, cid=1,
seq=null, RW, null, SET_UNCONDITIONALLY, servers:[4|172.16.2.3:10750,
19|172.16.2.18:10750], listeners:[]
2024-11-29 11:51:03,410 [timer1] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=8, request=AppendEntriesRequest:cid=9,entriesCount=0
2024-11-29 11:51:03,912 [timer2] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=9, request=AppendEntriesRequest:cid=10,entriesCount=0
2024-11-29 11:51:04,414 [timer3] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=10, request=AppendEntriesRequest:cid=11,entriesCount=0
2024-11-29 11:51:04,914 [timer4] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=11, request=AppendEntriesRequest:cid=12,entriesCount=0
2024-11-29 11:51:05,415 [timer5] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=12, request=AppendEntriesRequest:cid=13,entriesCount=0
2024-11-29 11:51:05,916 [timer6] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=13, request=AppendEntriesRequest:cid=14,entriesCount=0
2024-11-29 11:51:06,418 [timer7] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=14, request=AppendEntriesRequest:cid=15,entriesCount=0
2024-11-29 11:51:06,683
[19@group-000200000061->4-GrpcLogAppender-LogAppenderDaemon] WARN
o.a.r.g.s.GrpcLogAppender:293 - 19@group-000200000061->4-GrpcLogAppender: Wait
interrupted by java.lang.InterruptedException
2024-11-29 11:51:06,921 [timer0] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=15, request=AppendEntriesRequest:cid=16,entriesCount=0
2024-11-29 11:51:07,134 [19-server-thread1] INFO o.a.r.s.i.RaftServerImpl:1296
- 19@group-000200000061: receive setConfiguration
SetConfigurationRequest:client-5FD616FE3614->19@group-000200000061, cid=1,
seq=null, RW, null, SET_UNCONDITIONALLY, servers:[4|172.16.2.3:10750,
19|172.16.2.18:10750], listeners:[]
2024-11-29 11:51:07,135 [19-server-thread1] INFO o.a.r.s.i.LeaderStateImpl:485
- 19@group-000200000061-LeaderStateImpl: startSetConfiguration
SetConfigurationRequest:client-5FD616FE3614->19@group-000200000061, cid=1,
seq=null, RW, null, SET_UNCONDITIONALLY, servers:[4|172.16.2.3:10750,
19|172.16.2.18:10750], listeners:[]
2024-11-29 11:51:07,137
[19@group-000200000061->4-GrpcLogAppender-LogAppenderDaemon] INFO
o.a.r.s.i.FollowerInfoImpl:154 - Follower 19@group-000200000061->4 acknowledged
installing snapshot
2024-11-29 11:51:07,421 [timer1] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=16, request=AppendEntriesRequest:cid=17,entriesCount=0
2024-11-29 11:51:07,924 [timer2] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=17, request=AppendEntriesRequest:cid=18,entriesCount=0
2024-11-29 11:51:08,426 [timer3] WARN o.a.r.g.s.GrpcLogAppender:441 -
19@group-000200000061->4-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=18, request=AppendEntriesRequest:cid=19,entriesCount=0
2024-11-29 11:51:08,950 [grpc-default-executor-5] WARN
o.a.r.g.s.GrpcLogAppender$AppendLogResponseHandler:519 -
19@group-000200000061->4-AppendLogResponseHandler: received INCONSISTENCY reply
with nextIndex 1, errorCount=1,
request=AppendEntriesRequest:cid=1,entriesCount=9198,entries=(t:1, i:0)...(t:1,
i:9197)
2024-11-29 11:51:08,951 [grpc-default-executor-5] INFO
o.a.r.s.i.FollowerInfoImpl:70 - 19@group-000200000061->4: setNextIndex
nextIndex: updateUnconditionally 9198 -> 1
2024-11-29 11:51:08,972 [grpc-default-executor-5] INFO
o.a.r.g.s.GrpcLogAppender$AppendLogResponseHandler:550 -
19@group-000200000061->4-AppendLogResponseHandler: follower responses
appendEntries COMPLETED
2024-11-29 11:51:08,975 [grpc-default-executor-5] INFO
o.a.r.s.i.FollowerInfoImpl:70 - 19@group-000200000061->4: computeNextIndex
nextIndex: updateUnconditionally 9198 -> 0
2024-11-29 11:51:09,015 [grpc-default-executor-5] WARN
o.a.r.g.s.GrpcLogAppender$AppendLogResponseHandler:519 -
19@group-000200000061->4-AppendLogResponseHandler: received INCONSISTENCY reply
with nextIndex 9198, errorCount=1,
request=AppendEntriesRequest:cid=6,entriesCount=9197,entries=(t:1, i:1)...(t:1,
i:9197)
2024-11-29 11:51:09,022 [grpc-default-executor-1] INFO
o.a.r.g.s.GrpcLogAppender$AppendLogResponseHandler:550 -
19@group-000200000061->4-AppendLogResponseHandler: follower responses
appendEntries COMPLETED
{code}
new adding follower
{code:java}
2024-11-29 11:50:46,528 [4-groupManagement] INFO o.a.r.s.i.RaftServerImpl:405
- 4@group-000200000061: Successfully started.
2024-11-29 11:50:46,915 [4-server-thread1] INFO o.a.r.s.i.RaftServerImpl:1296
- 4@group-000200000061: receive setConfiguration
SetConfigurationRequest:client-5FD616FE3614->4@group-000200000061, cid=1,
seq=null, RW, null, SET_UNCONDITIONALLY, servers:[4|172.16.2.3:10750,
19|172.16.2.18:10750], listeners:[]
2024-11-29 11:50:50,886 [4-server-thread1] INFO o.a.r.s.i.RaftServerImpl:562 -
4@group-000200000061: set firstElectionSinceStartup to false for appendEntries
2024-11-29 11:50:50,886 [4-server-thread1] INFO
o.a.i.d.c.s.s.SchemaRegionStateMachine:75 - Current node [nodeId: 4] is no
longer the schema region leader [regionId: SchemaRegion[97]], the new leader is
[nodeId:19]
2024-11-29 11:50:50,886 [4-server-thread1] INFO o.a.r.s.impl.ServerState:272 -
4@group-000200000061: change Leader from null to 19 at term 1 for
appendEntries, leader elected after 4575ms
2024-11-29 11:50:50,887 [4-server-thread2] INFO o.a.r.s.impl.ServerState:386 -
4@group-000200000061: set configuration conf: {index: 0,
cur=peers:[19|172.16.2.18:10750]|listeners:[], old=null}
2024-11-29 11:50:50,887 [4-server-thread2] INFO
o.a.r.s.r.s.SegmentedRaftLogWorker:438 -
4@group-000200000061-SegmentedRaftLogWorker: Starting segment from index:0
2024-11-29 11:50:50,888 [4@group-000200000061-SegmentedRaftLogWorker] INFO
o.a.r.s.r.s.BufferedWriteChannel:61 - open log_inprogress_0 at position 0
2024-11-29 11:50:56,801 [4@group-000200000061-SegmentedRaftLogWorker] INFO
o.a.r.s.r.s.SegmentedRaftLogWorker$StartLogSegment:645 -
4@group-000200000061-SegmentedRaftLogWorker: created new log segment
/data1/iotdb/tablev2_rc5_1128_1fc99a6_1rep/data/datanode/consensus/schema_region/47474747-4747-4747-4747-000200000061/current/log_inprogress_0
2024-11-29 11:51:01,063 [4-server-thread21] INFO o.a.r.s.i.RaftServerImpl:1296
- 4@group-000200000061: receive setConfiguration
SetConfigurationRequest:client-5FD616FE3614->4@group-000200000061, cid=1,
seq=null, RW, null, SET_UNCONDITIONALLY, servers:[4|172.16.2.3:10750,
19|172.16.2.18:10750], listeners:[]
2024-11-29 11:51:05,118 [4-server-thread21] INFO o.a.r.s.i.RaftServerImpl:1296
- 4@group-000200000061: receive setConfiguration
SetConfigurationRequest:client-5FD616FE3614->4@group-000200000061, cid=1,
seq=null, RW, null, SET_UNCONDITIONALLY, servers:[4|172.16.2.3:10750,
19|172.16.2.18:10750], listeners:[]
2024-11-29 11:51:08,942 [4-server-thread22] INFO o.a.r.s.i.RaftServerImpl:1638
- 4@group-000200000061: Failed appendEntries as the first entry (index 0)
already exists (snapshotIndex: -1, commitIndex: 0)
2024-11-29 11:51:08,943 [4-server-thread22] INFO o.a.r.s.i.RaftServerImpl:1580
- 4@group-000200000061: appendEntries* reply
19<-4#1:FAIL-t1,INCONSISTENCY,nextIndex=1,followerCommit=-1,matchIndex=-1
2024-11-29 11:51:09,003 [4-server-thread2] INFO o.a.r.s.i.RaftServerImpl:1638
- 4@group-000200000061: Failed appendEntries as the first entry (index 1)
already exists (snapshotIndex: -1, commitIndex: 9197)
2024-11-29 11:51:09,004 [4-server-thread2] INFO o.a.r.s.i.RaftServerImpl:1580
- 4@group-000200000061: appendEntries* reply
19<-4#6:FAIL-t1,INCONSISTENCY,nextIndex=9198,followerCommit=9197,matchIndex=-1
{code}
>From the logs, we can see that the synchronization link from the leader to the
>new follower has been closed, so the logs cannot be committed. This causes the
>membership change thread to be stuck indefinitely while waiting for the
>follower to catch up.
Based on the logs above, I have a few suspicions:
- Why is the gRPC stream API on the leader side being completed? I haven't seen
any place on the follower side where this function is called.
- The commitIndex obtained in the checkInconsistentAppendEntries function on
the follower side seems inconsistent with the one obtained in
appendEntriesAsync. For example, in the following two lines of logs, the same
thread gets a commitIndex of -1 outside the lock, and an index of 0 inside the
lock. This causes the subsequent synchronization to be completely stuck. I'm
not sure if this is a concurrency issue, but perhaps we should move this line
inside the
[lock|ttps://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1591]?
{code:java}
2024-11-29 11:51:08,942 [4-server-thread22] INFO o.a.r.s.i.RaftServerImpl:1638
- 4@group-000200000061: Failed appendEntries as the first entry (index 0)
already exists (snapshotIndex: -1, commitIndex: 0)
2024-11-29 11:51:08,943 [4-server-thread22] INFO o.a.r.s.i.RaftServerImpl:1580
- 4@group-000200000061: appendEntries* reply
19<-4#1:FAIL-t1,INCONSISTENCY,nextIndex=1,followerCommit=-1,matchIndex=-1
{code}
What's your opinion? [~szetszwo]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)