[
https://issues.apache.org/jira/browse/RATIS-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840749#comment-16840749
]
Tsz Wo Nicholas Sze commented on RATIS-534:
-------------------------------------------
[~ljain], Netty chooses to use true as the default so that the feature probably
has its value. We do see a problem in the server when it is very busy but the
same problem may not happen in client. At least, the problem won't happen when
the client is not under a high load.
On the other hand, printing a warning (or an other log messages) in client is
known to be annoying and, in some cases such as CLI, it is inappropriate.
How about printing the message in client using log.debug(..)?
> Leader reelection due to JVM GC pauses
> --------------------------------------
>
> Key: RATIS-534
> URL: https://issues.apache.org/jira/browse/RATIS-534
> Project: Ratis
> Issue Type: Bug
> Reporter: Lokesh Jain
> Assignee: Lokesh Jain
> Priority: Major
> Attachments: RATIS-534.001.patch, RATIS-534.002.patch,
> RATIS-534.003.patch
>
>
> Due to JVM GC pauses a follower is not able to process the append entries and
> as a result turns into a candidate. This leads to leader reelection. Below
> are the logs illustrating the scenario.
> Below are the leader ratis logs. The leader sends heartbeat to follower after
> every 500ms.
> {code:java}
> 2019-04-23 18:34:46,360 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2,
> previousLog=term: 2
> index: 211601
> , leaderCommit=211601, commits:
> [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:46,360 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
> 2019-04-23 18:34:46,862 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2,
> previousLog=term: 2
> index: 211601
> , leaderCommit=211601, commits:
> [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:46,862 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
> 2019-04-23 18:34:47,362 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2,
> previousLog=term: 2
> index: 211601
> , leaderCommit=211601, commits:
> [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:47,362 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
> 2019-04-23 18:34:47,862 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2,
> previousLog=term: 2
> index: 211601
> , leaderCommit=211601, commits:
> [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:47,863 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
> 2019-04-23 18:34:48,363 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2,
> previousLog=term: 2
> index: 211601
> , leaderCommit=211601, commits:
> [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:48,363 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER
> GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 ->
> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
> {code}
> Below are the follower logs showing a GC pause.
> {code:java}
> 2019-04-23 18:34:46,361 INFO org.apache.ratis.server.impl.RaftServerImpl:
> eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 HEARTBEAT:
> eaa2a877-0e7f-44f3-a514-12243ce2b925: receive
> appendEntries(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16, 2, (t:2, i:211601),
> 211601, false, commits[5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:46,362 INFO org.apache.ratis.server.impl.RaftServerImpl:
> eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 HEARTBEAT:
> eaa2a877-0e7f-44f3-a514-12243ce2b925: succeeded to handle AppendEntries.
> Reply:
> 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16->eaa2a877-0e7f-44f3-a514-12243ce2b925,true,SUCCESS,nextIndex:211602,term:2,followerCommit:211601
> 2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.FollowerState:
> eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 changes to CANDIDATE,
> lastRpcTime:2221, electionTimeout:1124ms
> 2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.RoleInfo:
> eaa2a877-0e7f-44f3-a514-12243ce2b925: shutdown FollowerState
> 2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.RaftServerImpl:
> eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 changes role from
> FOLLOWER to CANDIDATE at term 2 for changeToCandidate
> 2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.RaftServerImpl:
> eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 HEARTBEAT:
> eaa2a877-0e7f-44f3-a514-12243ce2b925: receive
> appendEntries(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16, 2, (t:2, i:211601),
> 211601, false, commits[5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601,
> 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601,
> eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
> 2019-04-23 18:34:48,583 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected
> pause in JVM or host machine (eg GC): pause of approximately 1774ms
> GC pool 'PS MarkSweep' had collection(s): count=1 time=1359ms
> GC pool 'PS Scavenge' had collection(s): count=1 time=532ms
> 2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.RoleInfo:
> eaa2a877-0e7f-44f3-a514-12243ce2b925: start LeaderElection
> 2019-04-23 18:34:48,598 INFO org.apache.ratis.server.impl.RaftServerImpl:
> eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 changes role from
> CANDIDATE to FOLLOWER at term 2 for appendEntries
> 2019-04-23 18:34:48,598 INFO org.apache.ratis.server.impl.RoleInfo:
> eaa2a877-0e7f-44f3-a514-12243ce2b925: shutdown LeaderElection
> {code}
> There is a GC pause of approx. 2 seconds which causes the leader election to
> start in the follower.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)