[ 
https://issues.apache.org/jira/browse/HBASE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760115#comment-13760115
 ] 

Nicolas Liochon commented on HBASE-9451:
----------------------------------------

bq. I am trying to understand why it's hardcoded to 'false' for former case.
It's because if we don't have the status, then we don't know, so we consider 
the server is up.
                
> Meta remains unassigned when the meta server crashes with the 
> ClusterStatusListener set
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-9451
>                 URL: https://issues.apache.org/jira/browse/HBASE-9451
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>
> While running tests described in HBASE-9338, ran into this problem. The 
> hbase.status.listener.class was set to 
> org.apache.hadoop.hbase.client.ClusterStatusListener$MultiCastListener.
> 1. I had the meta server coming down
> 2. The metaSSH got triggered. The call chain:
>    2.1 verifyAndAssignMetaWithRetries
>    2.2 verifyMetaRegionLocation
>    2.3 waitForMetaServerConnection
>    2.4 getMetaServerConnection
>    2.5 getCachedConnection
>    2.6 HConnectionManager.getAdmin(serverName, false)
>    2.7 isDeadServer(serverName) -> This is hardcoded to return 'false' when 
> the clusterStatusListener field is null. If clusterStatusListener is not null 
> (in my test), then it could return true in certain cases (and in this case, 
> indeed it should return true since the server is down). I am trying to 
> understand why it's hardcoded to 'false' for former case.
> 3. When isDeadServer returns true, the method 
> HConnectionManager.getAdmin(ServerName, boolean) throws 
> RegionServerStoppedException.
> 4. Finally, after the retries are over verifyAndAssignMetaWithRetries gives 
> up and the master aborts.
> The methods in the above call chain don't handle 
> RegionServerStoppedException. Maybe something to look at... 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to