Hi Tsz-Wo, Thanks so much for your reply. So I was wrong, but I still can't figure why this would happen.
Here are some logs from that partitioned server. This server was notified to become leader and try to write message through RaftServer.submitClientRequestAsync. At the same time, it lost connection with all followers. This server will call RaftServer.submitClientRequestAsync continuously as long as the calling fails and the server does not receive any notification from StateMachine.notifyLeaderChanged or StateMachine.notifyNotLeader to give up leadership. Would you mind giving me some hint about what is going on in this log? The Ratis version is 2.0.0. [2021-05-13 03:11:30,048] [INFO] [main] [user-application] - sendAsync Continue cause org.apache.ratis.protocol.exceptions.LeaderNotReadyException: n2p8848hn2@group-ABB3109A44C1 is in LEADER state but not ready yet. [2021-05-13 03:11:33,073] [WARN] [java.util.concurrent.ThreadPoolExecutor$Worker@4bd7bd5d[State = -1, empty queue]] [org.apache.ratis.grpc.server.GrpcLogAppender] - n2p8848hn2@group-ABB3109A44C1->n1p8848hn1-GrpcLogAppender: appendEntries Timeout, request=AppendEntriesRequest:cid=90,entriesCount=1,lastEntry=(t:13, i:3497) [2021-05-13 03:11:33,077] [WARN] [java.util.concurrent.ThreadPoolExecutor$Worker@4bd7bd5d[State = -1, empty queue]] [org.apache.ratis.grpc.server.GrpcLogAppender] - n2p8848hn2@group-ABB3109A44C1->n3p8848hn3-GrpcLogAppender: appendEntries Timeout, request=AppendEntriesRequest:cid=90,entriesCount=1,lastEntry=(t:13, i:3497) [2021-05-13 03:11:35,074] [INFO] [main] [user-application] - Failed to submit start entry: java.util.concurrent.TimeoutException [2021-05-13 03:11:36,074] [WARN] [java.util.concurrent.ThreadPoolExecutor$Worker@4bd7bd5d[State = -1, empty queue]] [org.apache.ratis.grpc.server.GrpcLogAppender] - n2p8848hn2@group-ABB3109A44C1->n1p8848hn1-GrpcLogAppender: HEARTBEAT appendEntries Timeout, request=AppendEntriesRequest:cid=381,entriesCount=0,lastEntry=null [2021-05-13 03:11:36,075] [INFO] [main] [user-application] - sendAsync again ========== start repeating [2021-05-13 03:11:36,078] [WARN] [java.util.concurrent.ThreadPoolExecutor$Worker@4bd7bd5d[State = -1, empty queue]] [org.apache.ratis.grpc.server.GrpcLogAppender] - n2p8848hn2@group-ABB3109A44C1->n3p8848hn3-GrpcLogAppender: HEARTBEAT appendEntries Timeout, request=AppendEntriesRequest:cid=381,entriesCount=0,lastEntry=null [2021-05-13 03:11:39,075] [WARN] [java.util.concurrent.ThreadPoolExecutor$Worker@4bd7bd5d[State = -1, empty queue]] [org.apache.ratis.grpc.server.GrpcLogAppender] - n2p8848hn2@group-ABB3109A44C1->n1p8848hn1-GrpcLogAppender: HEARTBEAT appendEntries Timeout, request=AppendEntriesRequest:cid=672,entriesCount=0,lastEntry=null [2021-05-13 03:11:39,077] [WARN] [java.util.concurrent.ThreadPoolExecutor$Worker@4bd7bd5d[State = -1, empty queue]] [org.apache.ratis.grpc.server.GrpcLogAppender] - n2p8848hn2@group-ABB3109A44C1->n3p8848hn3-GrpcLogAppender: appendEntries Timeout, request=AppendEntriesRequest:cid=673,entriesCount=1,lastEntry=(t:13, i:3498) [2021-05-13 03:11:39,077] [WARN] [java.util.concurrent.ThreadPoolExecutor$Worker@4bd7bd5d[State = -1, empty queue]] [org.apache.ratis.grpc.server.GrpcLogAppender] - n2p8848hn2@group-ABB3109A44C1->n1p8848hn1-GrpcLogAppender: appendEntries Timeout, request=AppendEntriesRequest:cid=673,entriesCount=1,lastEntry=(t:13, i:3498) [2021-05-13 03:11:41,042] [INFO] [main] [user-application] - Failed to submit start entry: java.util.concurrent.TimeoutException [2021-05-13 03:11:42,043] [INFO] [main] [user-application] - sendAsync again ... [2021-05-13 03:12:03,054] [WARN] [java.util.concurrent.ThreadPoolExecutor$Worker@771ef45e[State = -1, empty queue]] [org.apache.ratis.grpc.server.GrpcLogAppender] - n2p8848hn2@group-ABB3109A44C1->n3p8848hn3-GrpcLogAppender: appendEntries Timeout, request=AppendEntriesRequest:cid=3005,entriesCount=1,lastEntry=(t:13, i:3502) ... [2021-05-13 03:26:14,306] [WARN] [java.util.concurrent.ThreadPoolExecutor$Worker@1dce73a8[State = -1, empty queue]] [org.apache.ratis.grpc.server.GrpcLogAppender] - n2p8848hn2@group-ABB3109A44C1->n1p8848hn1-GrpcLogAppender: HEARTBEAT appendEntries Timeout, request=AppendEntriesRequest:cid=74168,entriesCount=0,lastEntry=null [2021-05-13 03:26:14,307] [WARN] [java.util.concurrent.ThreadPoolExecutor$Worker@1dce73a8[State = -1, empty queue]] [org.apache.ratis.grpc.server.GrpcLogAppender] - n2p8848hn2@group-ABB3109A44C1->n3p8848hn3-GrpcLogAppender: HEARTBEAT appendEntries Timeout, request=AppendEntriesRequest:cid=74180,entriesCount=0,lastEntry=null [2021-05-13 03:26:16,307] [INFO] [main] [user-application] - Failed to submit start entry: java.util.concurrent.TimeoutException [2021-05-13 03:26:17,307] [WARN] [java.util.concurrent.ThreadPoolExecutor$Worker@1dce73a8[State = -1, empty queue]] [org.apache.ratis.grpc.server.GrpcLogAppender] - n2p8848hn2@group-ABB3109A44C1->n1p8848hn1-GrpcLogAppender: HEARTBEAT appendEntries Timeout, request=AppendEntriesRequest:cid=74169,entriesCount=0,lastEntry=null [2021-05-13 03:26:17,307] [INFO] [main] [user-application] - sendAsync again [2021-05-13 03:26:17,308] [WARN] [java.util.concurrent.ThreadPoolExecutor$Worker@1dce73a8[State = -1, empty queue]] [org.apache.ratis.grpc.server.GrpcLogAppender] - n2p8848hn2@group-ABB3109A44C1->n3p8848hn3-GrpcLogAppender: HEARTBEAT appendEntries Timeout, request=AppendEntriesRequest:cid=74181,entriesCount=0,lastEntry=null ========== network healed, end repeating [2021-05-13 03:26:17,552] [INFO] [grpc-default-executor-3] [org.apache.ratis.server.RaftServer$Division] - n2p8848hn2@group-ABB3109A44C1: change Leader from n2p8848hn2 to null at term 14 for updateCurrentTerm [2021-05-13 03:26:17,552] [INFO] [grpc-default-executor-3] [org.apache.ratis.server.RaftServer$Division] - n2p8848hn2@group-ABB3109A44C1: changes role from LEADER to FOLLOWER at term 14 for appendEntries By the way, there is no log coming from LeaderStateImpl.checkLeadership. Thanks again. ly
