[
https://issues.apache.org/jira/browse/HBASE-28180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780725#comment-17780725
]
Duo Zhang commented on HBASE-28180:
-----------------------------------
OK, I think the problem is, we have this call after master restarts
{code}
// clear the dead servers with same host name and port of online server
because we are not
// removing dead server with same hostname and port of rs which is trying
to check in before
// master initialization. See HBASE-5916.
this.serverManager.clearDeadServersWithSameHostNameAndPortOfOnlineServer();
{code}
It will clear the dead servers, then in ServerManager.expireServer, we can not
know that the server is already dead so we will try to schedule SCP for it. And
what makes things even worse is that, in AssignmentManager.submitServerCrash,
even if the serverNode is null, we will still schedule SCP for it, for solving
the 'unknownServers' problem...
Let me think how to better deal with these things...
> TestClusterRestartFailover fails in pre commit build
> ----------------------------------------------------
>
> Key: HBASE-28180
> URL: https://issues.apache.org/jira/browse/HBASE-28180
> Project: HBase
> Issue Type: Bug
> Components: master, proc-v2, test
> Reporter: Duo Zhang
> Priority: Major
> Attachments:
> org.apache.hadoop.hbase.master.TestClusterRestartFailover-output.txt
>
>
> It failed two times in this PR.
> https://github.com/apache/hbase/pull/5475
> Filed an issue to track this problem.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)