[
https://issues.apache.org/jira/browse/HBASE-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657855#comment-13657855
]
Hudson commented on HBASE-8537:
-------------------------------
Integrated in hbase-0.95-on-hadoop2 #99 (See
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/99/])
HBASE-8537 Dead region server pulled in from ZK (Revision 1482636)
Result = FAILURE
jxiang :
Files :
*
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ServerName.java
*
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
*
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
*
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
> Dead region server pulled in from ZK
> ------------------------------------
>
> Key: HBASE-8537
> URL: https://issues.apache.org/jira/browse/HBASE-8537
> Project: HBase
> Issue Type: Bug
> Components: master
> Reporter: Jimmy Xiang
> Assignee: Jimmy Xiang
> Priority: Minor
> Fix For: 0.98.0, 0.95.1
>
> Attachments: trunk-8537.patch, trunk-8537_v2.patch,
> trunk-8537_v3.patch
>
>
> When a cluster restarts quickly after it's crashed, although a new region
> server is reported in, the master still pulls in the dead region server from
> the zk.
> {noformat}
> 2013-05-12 18:32:52,996 INFO [IPC Server handler 6 on 36000]
> org.apache.hadoop.hbase.master.ServerManager: Registering
> server=a1217.halxg.cloudera.com,36020,1368408767773
> ....
> 2013-05-12 18:32:54,653 INFO
> [master-a1220.halxg.cloudera.com,36000,1368408767520]
> org.apache.hadoop.hbase.master.HMaster: Registering server found up in zk but
> who has not yet reported in: a1217.halxg.cloudera.com,36020,1368378273768
> 2013-05-12 18:32:54,653 INFO
> [master-a1220.halxg.cloudera.com,36000,1368408767520]
> org.apache.hadoop.hbase.master.ServerManager: Registering
> server=a1217.halxg.cloudera.com,36020,1368378273768
> {noformat}
> We should not pull in the second region server instance from zk. It is
> actually dead. We can figure this out by the hostname, and the port. We can
> assume no two region server instances can be alive on the same host, the same
> port. To be more cautious, we can check the timestamp as well. The live one
> should be that with the late timestamp, not pulled in from zk.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira