[ 
https://issues.apache.org/jira/browse/HDDS-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar Asawa reassigned HDDS-8343:
-----------------------------------------

    Assignee: Nandakumar

> Failed to elect leader due to Ratis group not found
> ---------------------------------------------------
>
>                 Key: HDDS-8343
>                 URL: https://issues.apache.org/jira/browse/HDDS-8343
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Kaijie Chen
>            Assignee: Nandakumar
>            Priority: Major
>
> We have encountered some problem during upgrade.
> (The problem may not be directly caused by the upgrade).
> Suppose we have 3 DataNodes forming a ratis group.
> On DN1 and DN2, the pipeline was closed and ratis group has been deleted.
> On DN3, the ratis group has not been deleted, so it's trying to start an 
> election but failed to elect a leader.
> In this situation, we cannot read data from this pipeline.
>  
> Here are the logs on the DN3
> {code:java}
> 2023-03-16 18:39:22,527 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C: changes role from  
> FOLLOWER to CANDIDATE at term 724 for changeToCandidate
> 2023-03-16 18:39:22,527 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE on pipeline 
> PipelineID=1b0b8153-71fd-437a-b486-bbea4a4fba6c.Reason : 
> 207b98d9-ad64-45a8-940f-504b514feff5 is in candidate state for 322168ms
> 2023-03-16 18:39:22,527 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO 
> org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.leaderelection.pre-vote = false (custom)
> 2023-03-16 18:39:22,527 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO 
> org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: 
> start 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310
> 2023-03-16 18:39:22,554 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] 
> INFO org.apache.ratis.server.impl.LeaderElection: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310 
> ELECTION round 0: submit vote requests at term 725 for 0: 
> peers:[f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e|rpc:9.179.142.251:9858|dataStream:|priority:0|startupRole:FOLLOWER,
>  
> 33b49c34-caa2-4b4f-894e-dce7db4f97b9|rpc:9.180.20.222:9858|dataStream:|priority:1|startupRole:FOLLOWER,
>  
> 207b98d9-ad64-45a8-940f-504b514feff5|rpc:9.180.21.88:9858|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
>  old=null
> 2023-03-16 18:39:22,554 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] 
> INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.rpc.first-election.timeout.min = 5s (fallback to 
> raft.server.rpc.timeout.min)
> 2023-03-16 18:39:22,554 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] 
> INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.rpc.first-election.timeout.max = 5200ms (fallback to 
> raft.server.rpc.timeout.max)
> 2023-03-16 18:39:22,554 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] 
> INFO org.apache.ratis.server.impl.LeaderElection: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310 got 
> exception when requesting votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2023-03-16 18:39:22,556 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] 
> INFO org.apache.ratis.server.impl.LeaderElection: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310 got 
> exception when requesting votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e: group-BBEA4A4FBA6C not found.
> 2023-03-16 18:39:22,556 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] 
> INFO org.apache.ratis.server.impl.LeaderElection: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310: 
> ELECTION REJECTED received 0 response(s) and 2 exception(s):
> 2023-03-16 18:39:22,556 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] 
> INFO org.apache.ratis.server.impl.LeaderElection:   Exception 0: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2023-03-16 18:39:22,556 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] 
> INFO org.apache.ratis.server.impl.LeaderElection:   Exception 1: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e: group-BBEA4A4FBA6C not found.
> 2023-03-16 18:39:22,556 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] 
> INFO org.apache.ratis.server.impl.LeaderElection: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310 
> ELECTION round 0: result REJECTED
> 2023-03-16 18:39:22,557 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] 
> INFO org.apache.ratis.server.RaftServer$Division: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C: changes role from 
> CANDIDATE to FOLLOWER at term 725 for REJECTED
> 2023-03-16 18:39:22,557 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] 
> INFO org.apache.ratis.server.impl.RoleInfo: 
> 207b98d9-ad64-45a8-940f-504b514feff5: shutdown 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310
> 2023-03-16 18:39:22,557 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] 
> INFO org.apache.ratis.server.impl.RoleInfo: 
> 207b98d9-ad64-45a8-940f-504b514feff5: start 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState
> 2023-03-16 18:39:22,557 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO 
> org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.rpc.first-election.timeout.min = 5s (fallback to 
> raft.server.rpc.timeout.min)
> 2023-03-16 18:39:22,557 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO 
> org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.rpc.first-election.timeout.max = 5200ms (fallback to 
> raft.server.rpc.timeout.max)
> 2023-03-16 18:39:25,688 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO 
> org.apache.ratis.server.impl.FollowerState: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState: change 
> to CANDIDATE, lastRpcElapsedTime:5189254060ns, electionTimeout:5189ms
> 2023-03-16 18:39:25,688 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO 
> org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: 
> shutdown 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState
> 2023-03-16 18:39:25,688 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D: changes role from  
> FOLLOWER to CANDIDATE at term 706 for changeToCandidate
> 2023-03-16 18:39:25,688 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE on pipeline 
> PipelineID=4a5ed735-c797-45e8-a8e5-c8992d1fb40d.Reason : 
> 207b98d9-ad64-45a8-940f-504b514feff5 is in candidate state for 325322ms
> 2023-03-16 18:39:25,688 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO 
> org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.leaderelection.pre-vote = false (custom) 2023-03-16 18:39:25,688 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO 
> org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: 
> start 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311
> 2023-03-16 18:39:25,723 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] 
> INFO org.apache.ratis.server.impl.LeaderElection: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311 
> ELECTION round 0: submit vote requests at term 707 for 0: 
> peers:[1e40274c-a4bd-4e3d-8479-59f8105ec408|rpc:100.76.18.99:9858|dataStream:|priority:1|startupRole:FOLLOWER,
>  
> 207b98d9-ad64-45a8-940f-504b514feff5|rpc:9.180.21.88:9858|dataStream:|priority:0|startupRole:FOLLOWER,
>  
> bcdf3bd5-7b8e-435d-b3fa-b3e29f0eb307|rpc:9.180.5.41:9858|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
>  old=null
> 2023-03-16 18:39:25,723 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] 
> INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.rpc.first-election.timeout.min = 5s (fallback to 
> raft.server.rpc.timeout.min)
> 2023-03-16 18:39:25,723 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] 
> INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.rpc.first-election.timeout.max = 5200ms (fallback to 
> raft.server.rpc.timeout.max)
> 2023-03-16 18:39:25,723 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] 
> INFO org.apache.ratis.server.impl.LeaderElection: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311 got 
> exception when requesting votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2023-03-16 18:39:25,723 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] 
> INFO org.apache.ratis.server.impl.LeaderElection: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311 got 
> exception when requesting votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2023-03-16 18:39:25,723 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] 
> INFO org.apache.ratis.server.impl.LeaderElection: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311: 
> ELECTION REJECTED received 0 response(s) and 2 exception(s):
> 2023-03-16 18:39:25,723 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] 
> INFO org.apache.ratis.server.impl.LeaderElection:   Exception 0: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2023-03-16 18:39:25,723 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] 
> INFO org.apache.ratis.server.impl.LeaderElection:   Exception 1: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2023-03-16 18:39:25,724 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] 
> INFO org.apache.ratis.server.impl.LeaderElection: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311 
> ELECTION round 0: result REJECTED
> 2023-03-16 18:39:25,724 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] 
> INFO org.apache.ratis.server.RaftServer$Division: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D: changes role from 
> CANDIDATE to FOLLOWER at term 707 for REJECTED
> 2023-03-16 18:39:25,724 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] 
> INFO org.apache.ratis.server.impl.RoleInfo: 
> 207b98d9-ad64-45a8-940f-504b514feff5: shutdown 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311
> 2023-03-16 18:39:25,724 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] 
> INFO org.apache.ratis.server.impl.RoleInfo: 
> 207b98d9-ad64-45a8-940f-504b514feff5: start 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState
> 2023-03-16 18:39:25,724 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO 
> org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.rpc.first-election.timeout.min = 5s (fallback to 
> raft.server.rpc.timeout.min)
> 2023-03-16 18:39:25,724 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO 
> org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.rpc.first-election.timeout.max = 5200ms (fallback to 
> raft.server.rpc.timeout.max)
> 2023-03-16 18:39:26,439 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO 
> org.apache.ratis.server.impl.FollowerState: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState: change 
> to CANDIDATE, lastRpcElapsedTime:5018766985ns, electionTimeout:5018ms
> 2023-03-16 18:39:26,439 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO 
> org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: 
> shutdown 207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState
> 2023-03-16 18:39:26,439 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1: changes role from  
> FOLLOWER to CANDIDATE at term 706 for changeToCandidate
> 2023-03-16 18:39:26,439 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE on pipeline 
> PipelineID=5d8061d6-1692-4eb4-a604-3c37ea22a9c1.Reason : 
> 207b98d9-ad64-45a8-940f-504b514feff5 is in candidate state for 326082ms
> 2023-03-16 18:39:26,439 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO 
> org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.leaderelection.pre-vote = false (custom)
> 2023-03-16 18:39:26,439 
> [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO 
> org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: 
> start 
> 207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-LeaderElection312
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to