[
https://issues.apache.org/jira/browse/RATIS-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839529#comment-16839529
]
Lokesh Jain commented on RATIS-534:
-----------------------------------
v3 patch adds the check inside GrpcFactory static block. This prevents
excessive logging of the warning.
> Leader reelection due to JVM GC pauses
> --------------------------------------
>
> Key: RATIS-534
> URL: https://issues.apache.org/jira/browse/RATIS-534
> Project: Ratis
> Issue Type: Bug
> Reporter: Lokesh Jain
> Assignee: Lokesh Jain
> Priority: Major
> Attachments: RATIS-534.001.patch, RATIS-534.002.patch,
> RATIS-534.003.patch
>
>
> Due to JVM GC pauses a follower is not able to process the append entries and
> as a result turns into a candidate. This leads to leader reelection. Below
> are the logs illustrating the scenario.
> Below are the leader ratis logs. The leader sends heartbeat to follower after
> every 500ms.
> {code:java}
> 2019-04-23 18:34:46,360 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2,
> previousLog=term: 2
> index: 211601
> , leaderCommit=211601, commits:
> [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:46,360 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
> 2019-04-23 18:34:46,862 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2,
> previousLog=term: 2
> index: 211601
> , leaderCommit=211601, commits:
> [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:46,862 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
> 2019-04-23 18:34:47,362 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2,
> previousLog=term: 2
> index: 211601
> , leaderCommit=211601, commits:
> [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:47,362 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
> 2019-04-23 18:34:47,862 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2,
> previousLog=term: 2
> index: 211601
> , leaderCommit=211601, commits:
> [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:47,863 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
> 2019-04-23 18:34:48,363 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2,
> previousLog=term: 2
> index: 211601
> , leaderCommit=211601, commits:
> [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:48,363 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
> {code}
> Below are the follower logs showing a GC pause.
> {code:java}
> 2019-04-23 18:34:46,361 INFO org.apache.ratis.server.impl.RaftServerImpl:
> eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 HEARTBEAT:
> eaa2a877-0e7f-44f3-a514-12243ce2b925: receive
> appendEntries(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16, 2, (t:2, i:211601),
> 211601, false, commits[5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:46,362 INFO org.apache.ratis.server.impl.RaftServerImpl:
> eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 HEARTBEAT:
> eaa2a877-0e7f-44f3-a514-12243ce2b925: succeeded to handle AppendEntries.
> Reply:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16->eaa2a877-0e7f-44f3-a514-12243ce2b925,true,SUCCESS,nextIndex:211602,term:2,followerCommit:211601
> 2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.FollowerState:
> eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 changes to CANDIDATE,
> lastRpcTime:2221, electionTimeout:1124ms
> 2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.RoleInfo:
> eaa2a877-0e7f-44f3-a514-12243ce2b925: shutdown FollowerState
> 2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.RaftServerImpl:
> eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 changes role from
> FOLLOWER to CANDIDATE at term 2 for changeToCandidate
> 2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.RaftServerImpl:
> eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 HEARTBEAT:
> eaa2a877-0e7f-44f3-a514-12243ce2b925: receive
> appendEntries(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16, 2, (t:2, i:211601),
> 211601, false, commits[5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:48,583 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected
> pause in JVM or host machine (eg GC): pause of approximately 1774ms
> GC pool 'PS MarkSweep' had collection(s): count=1 time=1359ms
> GC pool 'PS Scavenge' had collection(s): count=1 time=532ms
> 2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.RoleInfo:
> eaa2a877-0e7f-44f3-a514-12243ce2b925: start LeaderElection
> 2019-04-23 18:34:48,598 INFO org.apache.ratis.server.impl.RaftServerImpl:
> eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 changes role from
> CANDIDATE to FOLLOWER at term 2 for appendEntries
> 2019-04-23 18:34:48,598 INFO org.apache.ratis.server.impl.RoleInfo:
> eaa2a877-0e7f-44f3-a514-12243ce2b925: shutdown LeaderElection
> {code}
> There is a GC pause of approx. 2 seconds which causes the leader election to
> start in the follower.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)