[
https://issues.apache.org/jira/browse/HDDS-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477168#comment-17477168
]
Attila Doroszlai commented on HDDS-6191:
----------------------------------------
{{c7514957}} votes for {{18b95dfb}}:
{noformat}
2022-01-07 01:24:07,792 [grpc-default-executor-0] INFO
server.RaftServer$Division (RaftServerImpl.java:requestVote(1046)) -
c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075: receive
requestVote(ELECTION, 18b95dfb-8ef7-4011-a842-0dc8d0be8521, group-9EECFC4A6075,
2, (t:0, i:0))
2022-01-07 01:24:07,792 [grpc-default-executor-0] INFO impl.VoteContext
(VoteContext.java:log(48)) -
c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FOLLOWER: accept
ELECTION from 18b95dfb-8ef7-4011-a842-0dc8d0be8521: our priority 0 <=
candidate's priority 1
{noformat}
but just after 3ms starts new election due to RPC timeout:
{noformat}
2022-01-07 01:24:07,795
[c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState] INFO
impl.FollowerState (FollowerState.java:run(134)) -
c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState: change
to CANDIDATE, lastRpcElapsedTime:5172276476ns, electionTimeout:5172ms
2022-01-07 01:24:07,796
[c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState] INFO
server.RaftServer$Division (RaftServerImpl.java:setRole(290)) -
c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075: changes role from
FOLLOWER to CANDIDATE at term 2 for changeToCandidate
2022-01-07 01:24:07,805
[c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-LeaderElection7] INFO
impl.LeaderElection (LeaderElection.java:askForVotes(306)) -
c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-LeaderElection7
ELECTION round 0: submit vote requests at term 3 for -1:
[18b95dfb-8ef7-4011-a842-0dc8d0be8521|rpc:10.1.0.7:39307|priority:1,
c7514957-7287-4cfe-8c11-55f780daee7f|rpc:10.1.0.7:42065|priority:0,
baceac6d-bbfe-42b4-9bde-396690ef56d9|rpc:10.1.0.7:38573|priority:0], old=null
{noformat}
Meanwhile {{18b95dfb}} becomes leader with vote from {{c7514957}}:
{noformat}
2022-01-07 01:24:07,805
[18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO
impl.LeaderElection (LeaderElection.java:logAndReturn(89)) -
18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6:
ELECTION PASSED received 1 response(s) and 0 exception(s):
2022-01-07 01:24:07,814
[18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO
server.RaftServer$Division (RaftServerImpl.java:setRole(290)) -
18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: changes role from
CANDIDATE to LEADER at term 2 for changeToLeader
2022-01-07 01:24:07,814
[18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO
ratis.XceiverServerRatis
(XceiverServerRatis.java:handleLeaderChangedNotification(866)) - Leader change
notification received for group: group-9EECFC4A6075 with new leaderId:
18b95dfb-8ef7-4011-a842-0dc8d0be8521
2022-01-07 01:24:07,815
[18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO
server.RaftServer$Division (ServerState.java:setLeader(285)) -
18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: change Leader from
null to 18b95dfb-8ef7-4011-a842-0dc8d0be8521 at term 2 for becomeLeader, leader
elected after 9549ms
{noformat}
so pipeline is opened and SCM exits safe mode:
{noformat}
2022-01-07 01:24:07,818 [EventQueue-PipelineReportForPipelineReportHandler]
INFO pipeline.PipelineManagerImpl (PipelineManagerImpl.java:openPipeline(291))
- Pipeline Pipeline[ Id: b1f049ef-67e6-4b7d-89d8-9eecfc4a6075, ...
leaderId:18b95dfb-8ef7-4011-a842-0dc8d0be8521, ... moved to OPEN state
...
2022-01-07 01:24:07,819 [EventQueue-OpenPipelineForHealthyPipelineSafeModeRule]
INFO safemode.SCMSafeModeManager (SCMSafeModeManager.java:exitSafeMode(248)) -
SCM exiting safe mode.
{noformat}
But {{c7514957}}'s vote request makes {{18b95dfb}} step down from leadership:
{noformat}
2022-01-07 01:24:07,868 [grpc-default-executor-0] INFO impl.VoteContext
(VoteContext.java:log(48)) -
18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LEADER: reject ELECTION
from c7514957-7287-4cfe-8c11-55f780daee7f: our last entry (t:2, i:0) >
candidate's last entry (t:0, i:0)
2022-01-07 01:24:07,868 [grpc-default-executor-0] INFO
server.RaftServer$Division (ServerState.java:setLeader(285)) -
18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: change Leader from
18b95dfb-8ef7-4011-a842-0dc8d0be8521 to null at term 3 for updateCurrentTerm
2022-01-07 01:24:07,868 [grpc-default-executor-0] INFO
server.RaftServer$Division (RaftServerImpl.java:setRole(290)) -
18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: changes role from
LEADER to FOLLOWER at term 3 for candidate:c7514957-7287-4cfe-8c11-55f780daee7f
{noformat}
The result is that pipeline is open without leader, which is normally OK (as
client should retry until a new leader is elected). But the test makes an
assertion about leader being present, since it has waited for the pipeline to
open.
> Intermittent failure in TestDeleteWithSlowFollower
> --------------------------------------------------
>
> Key: HDDS-6191
> URL: https://issues.apache.org/jira/browse/HDDS-6191
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Attila Doroszlai
> Assignee: Attila Doroszlai
> Priority: Major
>
> {code}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 45.297 s <<<
> FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower
> testDeleteKeyWithSlowFollower Time elapsed: 0.193 s <<< FAILURE!
> java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:87)
> at org.junit.Assert.assertTrue(Assert.java:42)
> at org.junit.Assert.assertNotNull(Assert.java:713)
> at org.junit.Assert.assertNotNull(Assert.java:723)
> at
> org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.testDeleteKeyWithSlowFollower(TestDeleteWithSlowFollower.java:237)
> {code}
> This was originally reported in HDDS-3762. Then it seemed to be OK for
> extended period, but appeared again:
> https://github.com/elek/ozone-build-results/blob/master/2021/04/30/7624/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/05/19/7973/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/07/15/8962/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/08/23/9889/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/09/07/10146/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/09/09/10201/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/09/13/10242/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2022/01/07/12409/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]