[ 
https://issues.apache.org/jira/browse/RATIS-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752748#comment-17752748
 ] 

Sammi Chen commented on RATIS-788:
----------------------------------

Hi [~adoroszlai], it this problem still present? 

> Server stuck due to exception while becoming leader
> ---------------------------------------------------
>
>                 Key: RATIS-788
>                 URL: https://issues.apache.org/jira/browse/RATIS-788
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>            Reporter: Attila Doroszlai
>            Assignee: Rui Wang
>            Priority: Major
>         Attachments: 
> org.apache.hadoop.ozone.scm.TestSCMContainerPlacementPolicyMetrics-output.txt,
>  org.apache.hadoop.ozone.scm.TestSCMContainerPlacementPolicyMetrics.txt
>
>
> It seems if any exception happens while becoming leader, a single-node Ratis 
> group can get stuck: {{in LEADER state but not ready yet}}.  Or is there some 
> timeout that lets Ratis get out of this state?  I guess 3-node Ratis might be 
> able to recover.
> {{LeaderElection}} ignores the exception because it is already shut down 
> after successful vote.
> The {{ConcurrentModificationException}} which triggered it in this specific 
> case is being fixed in RATIS-789.
> {code}
> 2020-01-09 23:31:35,161 [Thread-95] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:setRole(173)) - 
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E: changes role from  
> FOLLOWER to CANDIDATE at term 0 for changeToCandidate
> 2020-01-09 23:31:35,165 [Thread-95] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(143)) - 6b60526e-eae6-4f33-854d-fa396187085c: 
> start LeaderElection
> 2020-01-09 23:31:35,176 
> [6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1] 
> INFO  impl.LeaderElection (LeaderElection.java:askForVotes(206)) - 
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1: 
> begin an election at term 1 for -1: 
> [6b60526e-eae6-4f33-854d-fa396187085c:localhost:9872], old=null
> 2020-01-09 23:31:35,177 
> [6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1] 
> INFO  impl.RoleInfo (RoleInfo.java:shutdownLeaderElection(134)) - 
> 6b60526e-eae6-4f33-854d-fa396187085c: shutdown LeaderElection
> 2020-01-09 23:31:35,178 
> [6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1] 
> INFO  impl.RaftServerImpl (RaftServerImpl.java:setRole(173)) - 
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E: changes role from 
> CANDIDATE to LEADER at term 1 for changeToLeader
> 2020-01-09 23:31:35,178 
> [6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1] 
> INFO  impl.RaftServerImpl (ServerState.java:setLeader(255)) - 
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E: change Leader from 
> null to 6b60526e-eae6-4f33-854d-fa396187085c at term 1 for becomeLeader, 
> leader elected after 1269ms
> ...
> 2020-01-09 23:31:35,217 
> [6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1] 
> INFO  impl.LeaderElection (LeaderElection.java:run(165)) - 
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E-LeaderElection1: 
> ConcurrentModificationException is safely ignored since this is already 
> CLOSING
> java.util.ConcurrentModificationException
>       at java.util.ArrayList.forEach(ArrayList.java:1260)
>       at 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl.lambda$create$1(MetricRegistriesImpl.java:66)
>       at 
> org.apache.ratis.metrics.impl.RefCountingMap.lambda$put$0(RefCountingMap.java:51)
>       at 
> java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
>       at 
> org.apache.ratis.metrics.impl.RefCountingMap.put(RefCountingMap.java:46)
>       at 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl.create(MetricRegistriesImpl.java:59)
>       at 
> org.apache.ratis.server.metrics.RatisMetrics.create(RatisMetrics.java:45)
>       at 
> org.apache.ratis.server.metrics.RatisMetrics.getMetricRegistryForLogAppender(RatisMetrics.java:82)
>       at 
> org.apache.ratis.server.metrics.LogAppenderMetrics.<init>(LogAppenderMetrics.java:32)
>       at org.apache.ratis.server.impl.LeaderState.<init>(LeaderState.java:221)
>       at 
> org.apache.ratis.server.impl.RoleInfo.startLeaderState(RoleInfo.java:94)
>       at 
> org.apache.ratis.server.impl.RaftServerImpl.changeToLeader(RaftServerImpl.java:348)
>       at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:238)
>       at 
> org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:161)
>       at java.lang.Thread.run(Thread.java:748)
> ...
> 2020-01-09 23:31:48,567 ... 
> 6b60526e-eae6-4f33-854d-fa396187085c@group-C5BA1605619E is in LEADER state 
> but not ready yet.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to