[
https://issues.apache.org/jira/browse/HDDS-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krishna Kumar Asawa reassigned HDDS-8343:
-----------------------------------------
Assignee: Nandakumar
> Failed to elect leader due to Ratis group not found
> ---------------------------------------------------
>
> Key: HDDS-8343
> URL: https://issues.apache.org/jira/browse/HDDS-8343
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Kaijie Chen
> Assignee: Nandakumar
> Priority: Major
>
> We have encountered some problem during upgrade.
> (The problem may not be directly caused by the upgrade).
> Suppose we have 3 DataNodes forming a ratis group.
> On DN1 and DN2, the pipeline was closed and ratis group has been deleted.
> On DN3, the ratis group has not been deleted, so it's trying to start an
> election but failed to elect a leader.
> In this situation, we cannot read data from this pipeline.
>
> Here are the logs on the DN3
> {code:java}
> 2023-03-16 18:39:22,527
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO
> org.apache.ratis.server.RaftServer$Division:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C: changes role from
> FOLLOWER to CANDIDATE at term 724 for changeToCandidate
> 2023-03-16 18:39:22,527
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] ERROR
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
> pipeline Action CLOSE on pipeline
> PipelineID=1b0b8153-71fd-437a-b486-bbea4a4fba6c.Reason :
> 207b98d9-ad64-45a8-940f-504b514feff5 is in candidate state for 322168ms
> 2023-03-16 18:39:22,527
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.leaderelection.pre-vote = false (custom)
> 2023-03-16 18:39:22,527
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO
> org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5:
> start
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310
> 2023-03-16 18:39:22,554
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310]
> INFO org.apache.ratis.server.impl.LeaderElection:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310
> ELECTION round 0: submit vote requests at term 725 for 0:
> peers:[f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e|rpc:9.179.142.251:9858|dataStream:|priority:0|startupRole:FOLLOWER,
>
> 33b49c34-caa2-4b4f-894e-dce7db4f97b9|rpc:9.180.20.222:9858|dataStream:|priority:1|startupRole:FOLLOWER,
>
> 207b98d9-ad64-45a8-940f-504b514feff5|rpc:9.180.21.88:9858|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
> old=null
> 2023-03-16 18:39:22,554
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310]
> INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.first-election.timeout.min = 5s (fallback to
> raft.server.rpc.timeout.min)
> 2023-03-16 18:39:22,554
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310]
> INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.first-election.timeout.max = 5200ms (fallback to
> raft.server.rpc.timeout.max)
> 2023-03-16 18:39:22,554
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310]
> INFO org.apache.ratis.server.impl.LeaderElection:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310 got
> exception when requesting votes: java.util.concurrent.ExecutionException:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
> exception
> 2023-03-16 18:39:22,556
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310]
> INFO org.apache.ratis.server.impl.LeaderElection:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310 got
> exception when requesting votes: java.util.concurrent.ExecutionException:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL:
> f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e: group-BBEA4A4FBA6C not found.
> 2023-03-16 18:39:22,556
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310]
> INFO org.apache.ratis.server.impl.LeaderElection:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310:
> ELECTION REJECTED received 0 response(s) and 2 exception(s):
> 2023-03-16 18:39:22,556
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310]
> INFO org.apache.ratis.server.impl.LeaderElection: Exception 0:
> java.util.concurrent.ExecutionException:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
> exception
> 2023-03-16 18:39:22,556
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310]
> INFO org.apache.ratis.server.impl.LeaderElection: Exception 1:
> java.util.concurrent.ExecutionException:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL:
> f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e: group-BBEA4A4FBA6C not found.
> 2023-03-16 18:39:22,556
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310]
> INFO org.apache.ratis.server.impl.LeaderElection:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310
> ELECTION round 0: result REJECTED
> 2023-03-16 18:39:22,557
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310]
> INFO org.apache.ratis.server.RaftServer$Division:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C: changes role from
> CANDIDATE to FOLLOWER at term 725 for REJECTED
> 2023-03-16 18:39:22,557
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310]
> INFO org.apache.ratis.server.impl.RoleInfo:
> 207b98d9-ad64-45a8-940f-504b514feff5: shutdown
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310
> 2023-03-16 18:39:22,557
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310]
> INFO org.apache.ratis.server.impl.RoleInfo:
> 207b98d9-ad64-45a8-940f-504b514feff5: start
> 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState
> 2023-03-16 18:39:22,557
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.first-election.timeout.min = 5s (fallback to
> raft.server.rpc.timeout.min)
> 2023-03-16 18:39:22,557
> [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.first-election.timeout.max = 5200ms (fallback to
> raft.server.rpc.timeout.max)
> 2023-03-16 18:39:25,688
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO
> org.apache.ratis.server.impl.FollowerState:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState: change
> to CANDIDATE, lastRpcElapsedTime:5189254060ns, electionTimeout:5189ms
> 2023-03-16 18:39:25,688
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO
> org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5:
> shutdown 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState
> 2023-03-16 18:39:25,688
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO
> org.apache.ratis.server.RaftServer$Division:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D: changes role from
> FOLLOWER to CANDIDATE at term 706 for changeToCandidate
> 2023-03-16 18:39:25,688
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] ERROR
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
> pipeline Action CLOSE on pipeline
> PipelineID=4a5ed735-c797-45e8-a8e5-c8992d1fb40d.Reason :
> 207b98d9-ad64-45a8-940f-504b514feff5 is in candidate state for 325322ms
> 2023-03-16 18:39:25,688
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.leaderelection.pre-vote = false (custom) 2023-03-16 18:39:25,688
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO
> org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5:
> start
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311
> 2023-03-16 18:39:25,723
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311]
> INFO org.apache.ratis.server.impl.LeaderElection:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311
> ELECTION round 0: submit vote requests at term 707 for 0:
> peers:[1e40274c-a4bd-4e3d-8479-59f8105ec408|rpc:100.76.18.99:9858|dataStream:|priority:1|startupRole:FOLLOWER,
>
> 207b98d9-ad64-45a8-940f-504b514feff5|rpc:9.180.21.88:9858|dataStream:|priority:0|startupRole:FOLLOWER,
>
> bcdf3bd5-7b8e-435d-b3fa-b3e29f0eb307|rpc:9.180.5.41:9858|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
> old=null
> 2023-03-16 18:39:25,723
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311]
> INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.first-election.timeout.min = 5s (fallback to
> raft.server.rpc.timeout.min)
> 2023-03-16 18:39:25,723
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311]
> INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.first-election.timeout.max = 5200ms (fallback to
> raft.server.rpc.timeout.max)
> 2023-03-16 18:39:25,723
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311]
> INFO org.apache.ratis.server.impl.LeaderElection:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311 got
> exception when requesting votes: java.util.concurrent.ExecutionException:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
> exception
> 2023-03-16 18:39:25,723
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311]
> INFO org.apache.ratis.server.impl.LeaderElection:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311 got
> exception when requesting votes: java.util.concurrent.ExecutionException:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
> exception
> 2023-03-16 18:39:25,723
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311]
> INFO org.apache.ratis.server.impl.LeaderElection:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311:
> ELECTION REJECTED received 0 response(s) and 2 exception(s):
> 2023-03-16 18:39:25,723
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311]
> INFO org.apache.ratis.server.impl.LeaderElection: Exception 0:
> java.util.concurrent.ExecutionException:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
> exception
> 2023-03-16 18:39:25,723
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311]
> INFO org.apache.ratis.server.impl.LeaderElection: Exception 1:
> java.util.concurrent.ExecutionException:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
> exception
> 2023-03-16 18:39:25,724
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311]
> INFO org.apache.ratis.server.impl.LeaderElection:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311
> ELECTION round 0: result REJECTED
> 2023-03-16 18:39:25,724
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311]
> INFO org.apache.ratis.server.RaftServer$Division:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D: changes role from
> CANDIDATE to FOLLOWER at term 707 for REJECTED
> 2023-03-16 18:39:25,724
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311]
> INFO org.apache.ratis.server.impl.RoleInfo:
> 207b98d9-ad64-45a8-940f-504b514feff5: shutdown
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311
> 2023-03-16 18:39:25,724
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311]
> INFO org.apache.ratis.server.impl.RoleInfo:
> 207b98d9-ad64-45a8-940f-504b514feff5: start
> 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState
> 2023-03-16 18:39:25,724
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.first-election.timeout.min = 5s (fallback to
> raft.server.rpc.timeout.min)
> 2023-03-16 18:39:25,724
> [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.first-election.timeout.max = 5200ms (fallback to
> raft.server.rpc.timeout.max)
> 2023-03-16 18:39:26,439
> [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO
> org.apache.ratis.server.impl.FollowerState:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState: change
> to CANDIDATE, lastRpcElapsedTime:5018766985ns, electionTimeout:5018ms
> 2023-03-16 18:39:26,439
> [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO
> org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5:
> shutdown 207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState
> 2023-03-16 18:39:26,439
> [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO
> org.apache.ratis.server.RaftServer$Division:
> 207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1: changes role from
> FOLLOWER to CANDIDATE at term 706 for changeToCandidate
> 2023-03-16 18:39:26,439
> [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] ERROR
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
> pipeline Action CLOSE on pipeline
> PipelineID=5d8061d6-1692-4eb4-a604-3c37ea22a9c1.Reason :
> 207b98d9-ad64-45a8-940f-504b514feff5 is in candidate state for 326082ms
> 2023-03-16 18:39:26,439
> [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.leaderelection.pre-vote = false (custom)
> 2023-03-16 18:39:26,439
> [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO
> org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5:
> start
> 207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-LeaderElection312
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]