Attila Doroszlai created RATIS-788:
--------------------------------------
Summary: Server stuck due to exception while becoming leader
Key: RATIS-788
URL: https://issues.apache.org/jira/browse/RATIS-788
Project: Ratis
Issue Type: Bug
Components: server
Reporter: Attila Doroszlai
It seems single-node Ratis can get stuck if exception happens while becoming
leader. {{LeaderElection}} ignores the exception because it is already shut
down after successful vote. I guess 3-node Ratis might be able to recover.
{code}
2020-01-09 23:31:35,160 [Thread-95] INFO impl.FollowerState
(FollowerState.java:run(108)) -
6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-FollowerState: change
to CANDIDATE, lastRpcTime:1117ms, electionTimeout:1103ms
2020-01-09 23:31:35,161 [Thread-95] INFO impl.RoleInfo
(RoleInfo.java:shutdownFollowerState(121)) -
6b60526e-eae6-4f33-854d-fa396187085c: shutdown FollowerState
2020-01-09 23:31:35,161 [Thread-95] INFO impl.RaftServerImpl
(RaftServerImpl.java:setRole(173)) -
6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E: changes role from
FOLLOWER to CANDIDATE at term 0 for changeToCandidate
2020-01-09 23:31:35,165 [Thread-95] INFO impl.RoleInfo
(RoleInfo.java:updateAndGet(143)) - 6b60526e-eae6-4f33-854d-fa396187085c: start
LeaderElection
2020-01-09 23:31:35,176
[6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1] INFO
impl.LeaderElection (LeaderElection.java:askForVotes(206)) -
6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1: begin
an election at term 1 for -1:
[6b60526e-eae6-4f33-854d-fa396187085c:localhost:9872], old=null
2020-01-09 23:31:35,177
[6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1] INFO
impl.RoleInfo (RoleInfo.java:shutdownLeaderElection(134)) -
6b60526e-eae6-4f33-854d-fa396187085c: shutdown LeaderElection
2020-01-09 23:31:35,178
[6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1] INFO
impl.RaftServerImpl (RaftServerImpl.java:setRole(173)) -
6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E: changes role from
CANDIDATE to LEADER at term 1 for changeToLeader
2020-01-09 23:31:35,178
[6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1] INFO
impl.RaftServerImpl (ServerState.java:setLeader(255)) -
6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E: change Leader from
null to 6b60526e-eae6-4f33-854d-fa396187085c at term 1 for becomeLeader, leader
elected after 1269ms
2020-01-09 23:31:35,183
[6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1] INFO
server.RaftServerConfigKeys (ConfUtils.java:logGet(43)) -
raft.server.staging.catchup.gap = 1000 (default)
2020-01-09 23:31:35,185
[6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1] INFO
server.RaftServerConfigKeys (ConfUtils.java:logGet(43)) -
raft.server.rpc.sleep.time = 25ms (default)
2020-01-09 23:31:35,217
[6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1] INFO
impl.LeaderElection (LeaderElection.java:run(165)) -
6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1:
ConcurrentModificationException is safely ignored since this is already CLOSING
java.util.ConcurrentModificationException
at java.util.ArrayList.forEach(ArrayList.java:1260)
at
org.apache.ratis.metrics.impl.MetricRegistriesImpl.lambda$create$1(MetricRegistriesImpl.java:66)
at
org.apache.ratis.metrics.impl.RefCountingMap.lambda$put$0(RefCountingMap.java:51)
at
java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
at
org.apache.ratis.metrics.impl.RefCountingMap.put(RefCountingMap.java:46)
at
org.apache.ratis.metrics.impl.MetricRegistriesImpl.create(MetricRegistriesImpl.java:59)
at
org.apache.ratis.server.metrics.RatisMetrics.create(RatisMetrics.java:45)
at
org.apache.ratis.server.metrics.RatisMetrics.getMetricRegistryForLogAppender(RatisMetrics.java:82)
at
org.apache.ratis.server.metrics.LogAppenderMetrics.<init>(LogAppenderMetrics.java:32)
at org.apache.ratis.server.impl.LeaderState.<init>(LeaderState.java:221)
at
org.apache.ratis.server.impl.RoleInfo.startLeaderState(RoleInfo.java:94)
at
org.apache.ratis.server.impl.RaftServerImpl.changeToLeader(RaftServerImpl.java:348)
at
org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:238)
at
org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:161)
at java.lang.Thread.run(Thread.java:748)
...
2020-01-09 23:31:48,567 ...
6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E is in LEADER state but
not ready yet.
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)