[ 
https://issues.apache.org/jira/browse/HBASE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763841#comment-13763841
 ] 

Hudson commented on HBASE-9451:
-------------------------------

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #721 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/721/])
HBASE-9451  Meta remains unassigned when the meta server crashes with the 
ClusterStatusListener set (nkeywal: rev 1521513)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java

                
> Meta remains unassigned when the meta server crashes with the 
> ClusterStatusListener set
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-9451
>                 URL: https://issues.apache.org/jira/browse/HBASE-9451
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>            Assignee: Nicolas Liochon
>             Fix For: 0.98.0, 0.96.0
>
>         Attachments: 9451.v1.patch
>
>
> While running tests described in HBASE-9338, ran into this problem. The 
> hbase.status.listener.class was set to 
> org.apache.hadoop.hbase.client.ClusterStatusListener$MultiCastListener.
> 1. I had the meta server coming down
> 2. The metaSSH got triggered. The call chain:
>    2.1 verifyAndAssignMetaWithRetries
>    2.2 verifyMetaRegionLocation
>    2.3 waitForMetaServerConnection
>    2.4 getMetaServerConnection
>    2.5 getCachedConnection
>    2.6 HConnectionManager.getAdmin(serverName, false)
>    2.7 isDeadServer(serverName) -> This is hardcoded to return 'false' when 
> the clusterStatusListener field is null. If clusterStatusListener is not null 
> (in my test), then it could return true in certain cases (and in this case, 
> indeed it should return true since the server is down). I am trying to 
> understand why it's hardcoded to 'false' for former case.
> 3. When isDeadServer returns true, the method 
> HConnectionManager.getAdmin(ServerName, boolean) throws 
> RegionServerStoppedException.
> 4. Finally, after the retries are over verifyAndAssignMetaWithRetries gives 
> up and the master aborts.
> The methods in the above call chain don't handle 
> RegionServerStoppedException. Maybe something to look at... 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to