[
https://issues.apache.org/jira/browse/RATIS-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai updated RATIS-788:
-----------------------------------
Attachment:
org.apache.hadoop.ozone.scm.TestSCMContainerPlacementPolicyMetrics.txt
org.apache.hadoop.ozone.scm.TestSCMContainerPlacementPolicyMetrics-output.txt
> Server stuck due to exception while becoming leader
> ---------------------------------------------------
>
> Key: RATIS-788
> URL: https://issues.apache.org/jira/browse/RATIS-788
> Project: Ratis
> Issue Type: Bug
> Components: server
> Reporter: Attila Doroszlai
> Priority: Major
> Attachments:
> org.apache.hadoop.ozone.scm.TestSCMContainerPlacementPolicyMetrics-output.txt,
> org.apache.hadoop.ozone.scm.TestSCMContainerPlacementPolicyMetrics.txt
>
>
> It seems single-node Ratis can get stuck if exception happens while becoming
> leader. {{LeaderElection}} ignores the exception because it is already shut
> down after successful vote. I guess 3-node Ratis might be able to recover.
> {code}
> 2020-01-09 23:31:35,160 [Thread-95] INFO impl.FollowerState
> (FollowerState.java:run(108)) -
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-FollowerState: change
> to CANDIDATE, lastRpcTime:1117ms, electionTimeout:1103ms
> 2020-01-09 23:31:35,161 [Thread-95] INFO impl.RoleInfo
> (RoleInfo.java:shutdownFollowerState(121)) -
> 6b60526e-eae6-4f33-854d-fa396187085c: shutdown FollowerState
> 2020-01-09 23:31:35,161 [Thread-95] INFO impl.RaftServerImpl
> (RaftServerImpl.java:setRole(173)) -
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E: changes role from
> FOLLOWER to CANDIDATE at term 0 for changeToCandidate
> 2020-01-09 23:31:35,165 [Thread-95] INFO impl.RoleInfo
> (RoleInfo.java:updateAndGet(143)) - 6b60526e-eae6-4f33-854d-fa396187085c:
> start LeaderElection
> 2020-01-09 23:31:35,176
> [6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1]
> INFO impl.LeaderElection (LeaderElection.java:askForVotes(206)) -
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1:
> begin an election at term 1 for -1:
> [6b60526e-eae6-4f33-854d-fa396187085c:localhost:9872], old=null
> 2020-01-09 23:31:35,177
> [6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1]
> INFO impl.RoleInfo (RoleInfo.java:shutdownLeaderElection(134)) -
> 6b60526e-eae6-4f33-854d-fa396187085c: shutdown LeaderElection
> 2020-01-09 23:31:35,178
> [6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1]
> INFO impl.RaftServerImpl (RaftServerImpl.java:setRole(173)) -
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E: changes role from
> CANDIDATE to LEADER at term 1 for changeToLeader
> 2020-01-09 23:31:35,178
> [6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1]
> INFO impl.RaftServerImpl (ServerState.java:setLeader(255)) -
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E: change Leader from
> null to 6b60526e-eae6-4f33-854d-fa396187085c at term 1 for becomeLeader,
> leader elected after 1269ms
> 2020-01-09 23:31:35,183
> [6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1]
> INFO server.RaftServerConfigKeys (ConfUtils.java:logGet(43)) -
> raft.server.staging.catchup.gap = 1000 (default)
> 2020-01-09 23:31:35,185
> [6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1]
> INFO server.RaftServerConfigKeys (ConfUtils.java:logGet(43)) -
> raft.server.rpc.sleep.time = 25ms (default)
> 2020-01-09 23:31:35,217
> [6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1]
> INFO impl.LeaderElection (LeaderElection.java:run(165)) -
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1:
> ConcurrentModificationException is safely ignored since this is already
> CLOSING
> java.util.ConcurrentModificationException
> at java.util.ArrayList.forEach(ArrayList.java:1260)
> at
> org.apache.ratis.metrics.impl.MetricRegistriesImpl.lambda$create$1(MetricRegistriesImpl.java:66)
> at
> org.apache.ratis.metrics.impl.RefCountingMap.lambda$put$0(RefCountingMap.java:51)
> at
> java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
> at
> org.apache.ratis.metrics.impl.RefCountingMap.put(RefCountingMap.java:46)
> at
> org.apache.ratis.metrics.impl.MetricRegistriesImpl.create(MetricRegistriesImpl.java:59)
> at
> org.apache.ratis.server.metrics.RatisMetrics.create(RatisMetrics.java:45)
> at
> org.apache.ratis.server.metrics.RatisMetrics.getMetricRegistryForLogAppender(RatisMetrics.java:82)
> at
> org.apache.ratis.server.metrics.LogAppenderMetrics.<init>(LogAppenderMetrics.java:32)
> at org.apache.ratis.server.impl.LeaderState.<init>(LeaderState.java:221)
> at
> org.apache.ratis.server.impl.RoleInfo.startLeaderState(RoleInfo.java:94)
> at
> org.apache.ratis.server.impl.RaftServerImpl.changeToLeader(RaftServerImpl.java:348)
> at
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:238)
> at
> org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:161)
> at java.lang.Thread.run(Thread.java:748)
> ...
> 2020-01-09 23:31:48,567 ...
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E is in LEADER state
> but not ready yet.
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)