[ 
https://issues.apache.org/jira/browse/HBASE-28180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780725#comment-17780725
 ] 

Duo Zhang commented on HBASE-28180:
-----------------------------------

OK, I think the problem is, we have this call after master restarts

{code}
    // clear the dead servers with same host name and port of online server 
because we are not
    // removing dead server with same hostname and port of rs which is trying 
to check in before
    // master initialization. See HBASE-5916.
    this.serverManager.clearDeadServersWithSameHostNameAndPortOfOnlineServer();
{code}

It will clear the dead servers, then in ServerManager.expireServer, we can not 
know that the server is already dead so we will try to schedule SCP for it. And 
what makes things even worse is that, in AssignmentManager.submitServerCrash, 
even if the serverNode is null, we will still schedule SCP for it, for solving 
the 'unknownServers' problem...

Let me think how to better deal with these things...

> TestClusterRestartFailover fails in pre commit build
> ----------------------------------------------------
>
>                 Key: HBASE-28180
>                 URL: https://issues.apache.org/jira/browse/HBASE-28180
>             Project: HBase
>          Issue Type: Bug
>          Components: master, proc-v2, test
>            Reporter: Duo Zhang
>            Priority: Major
>         Attachments: 
> org.apache.hadoop.hbase.master.TestClusterRestartFailover-output.txt
>
>
> It failed two times in this PR.
> https://github.com/apache/hbase/pull/5475
> Filed an issue to track this problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to