[ 
https://issues.apache.org/jira/browse/HDDS-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477168#comment-17477168
 ] 

Attila Doroszlai commented on HDDS-6191:
----------------------------------------

{{c7514957}} votes for {{18b95dfb}}:

{noformat}
2022-01-07 01:24:07,792 [grpc-default-executor-0] INFO  
server.RaftServer$Division (RaftServerImpl.java:requestVote(1046)) - 
c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075: receive 
requestVote(ELECTION, 18b95dfb-8ef7-4011-a842-0dc8d0be8521, group-9EECFC4A6075, 
2, (t:0, i:0))

2022-01-07 01:24:07,792 [grpc-default-executor-0] INFO  impl.VoteContext 
(VoteContext.java:log(48)) - 
c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FOLLOWER: accept 
ELECTION from 18b95dfb-8ef7-4011-a842-0dc8d0be8521: our priority 0 <= 
candidate's priority 1
{noformat}

but just after 3ms starts new election due to RPC timeout:

{noformat}
2022-01-07 01:24:07,795 
[c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState] INFO  
impl.FollowerState (FollowerState.java:run(134)) - 
c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState: change 
to CANDIDATE, lastRpcElapsedTime:5172276476ns, electionTimeout:5172ms

2022-01-07 01:24:07,796 
[c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState] INFO  
server.RaftServer$Division (RaftServerImpl.java:setRole(290)) - 
c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075: changes role from  
FOLLOWER to CANDIDATE at term 2 for changeToCandidate

2022-01-07 01:24:07,805 
[c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-LeaderElection7] INFO  
impl.LeaderElection (LeaderElection.java:askForVotes(306)) - 
c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-LeaderElection7 
ELECTION round 0: submit vote requests at term 3 for -1: 
[18b95dfb-8ef7-4011-a842-0dc8d0be8521|rpc:10.1.0.7:39307|priority:1, 
c7514957-7287-4cfe-8c11-55f780daee7f|rpc:10.1.0.7:42065|priority:0, 
baceac6d-bbfe-42b4-9bde-396690ef56d9|rpc:10.1.0.7:38573|priority:0], old=null
{noformat}

Meanwhile {{18b95dfb}} becomes leader with vote from {{c7514957}}:

{noformat}
2022-01-07 01:24:07,805 
[18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO  
impl.LeaderElection (LeaderElection.java:logAndReturn(89)) - 
18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6: 
ELECTION PASSED received 1 response(s) and 0 exception(s):

2022-01-07 01:24:07,814 
[18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO  
server.RaftServer$Division (RaftServerImpl.java:setRole(290)) - 
18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: changes role from 
CANDIDATE to LEADER at term 2 for changeToLeader

2022-01-07 01:24:07,814 
[18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO  
ratis.XceiverServerRatis 
(XceiverServerRatis.java:handleLeaderChangedNotification(866)) - Leader change 
notification received for group: group-9EECFC4A6075 with new leaderId: 
18b95dfb-8ef7-4011-a842-0dc8d0be8521

2022-01-07 01:24:07,815 
[18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO  
server.RaftServer$Division (ServerState.java:setLeader(285)) - 
18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: change Leader from 
null to 18b95dfb-8ef7-4011-a842-0dc8d0be8521 at term 2 for becomeLeader, leader 
elected after 9549ms
{noformat}

so pipeline is opened and SCM exits safe mode:

{noformat}
2022-01-07 01:24:07,818 [EventQueue-PipelineReportForPipelineReportHandler] 
INFO  pipeline.PipelineManagerImpl (PipelineManagerImpl.java:openPipeline(291)) 
- Pipeline Pipeline[ Id: b1f049ef-67e6-4b7d-89d8-9eecfc4a6075, ... 
leaderId:18b95dfb-8ef7-4011-a842-0dc8d0be8521, ... moved to OPEN state
...
2022-01-07 01:24:07,819 [EventQueue-OpenPipelineForHealthyPipelineSafeModeRule] 
INFO  safemode.SCMSafeModeManager (SCMSafeModeManager.java:exitSafeMode(248)) - 
SCM exiting safe mode.
{noformat}

But {{c7514957}}'s vote request makes {{18b95dfb}} step down from leadership:

{noformat}
2022-01-07 01:24:07,868 [grpc-default-executor-0] INFO  impl.VoteContext 
(VoteContext.java:log(48)) - 
18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LEADER: reject ELECTION 
from c7514957-7287-4cfe-8c11-55f780daee7f: our last entry (t:2, i:0) > 
candidate's last entry (t:0, i:0)
2022-01-07 01:24:07,868 [grpc-default-executor-0] INFO  
server.RaftServer$Division (ServerState.java:setLeader(285)) - 
18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: change Leader from 
18b95dfb-8ef7-4011-a842-0dc8d0be8521 to null at term 3 for updateCurrentTerm
2022-01-07 01:24:07,868 [grpc-default-executor-0] INFO  
server.RaftServer$Division (RaftServerImpl.java:setRole(290)) - 
18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: changes role from    
LEADER to FOLLOWER at term 3 for candidate:c7514957-7287-4cfe-8c11-55f780daee7f
{noformat}

The result is that pipeline is open without leader, which is normally OK (as 
client should retry until a new leader is elected).  But the test makes an 
assertion about leader being present, since it has waited for the pipeline to 
open.

> Intermittent failure in TestDeleteWithSlowFollower
> --------------------------------------------------
>
>                 Key: HDDS-6191
>                 URL: https://issues.apache.org/jira/browse/HDDS-6191
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Attila Doroszlai
>            Assignee: Attila Doroszlai
>            Priority: Major
>
> {code}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 45.297 s <<< 
> FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower
> testDeleteKeyWithSlowFollower  Time elapsed: 0.193 s  <<< FAILURE!
> java.lang.AssertionError
>       at org.junit.Assert.fail(Assert.java:87)
>       at org.junit.Assert.assertTrue(Assert.java:42)
>       at org.junit.Assert.assertNotNull(Assert.java:713)
>       at org.junit.Assert.assertNotNull(Assert.java:723)
>       at 
> org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.testDeleteKeyWithSlowFollower(TestDeleteWithSlowFollower.java:237)
> {code}
> This was originally reported in HDDS-3762.  Then it seemed to be OK for 
> extended period, but appeared again:
> https://github.com/elek/ozone-build-results/blob/master/2021/04/30/7624/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/05/19/7973/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/07/15/8962/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/08/23/9889/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/09/07/10146/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/09/09/10201/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/09/13/10242/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2022/01/07/12409/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to