[ https://issues.apache.org/jira/browse/HBASE-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jimmy Xiang updated HBASE-8537: ------------------------------- Status: Open (was: Patch Available) > Dead region server pulled in from ZK > ------------------------------------ > > Key: HBASE-8537 > URL: https://issues.apache.org/jira/browse/HBASE-8537 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.98.0 > Reporter: Jimmy Xiang > Assignee: Jimmy Xiang > Priority: Minor > Attachments: trunk-8537.patch > > > When a cluster restarts quickly after it's crashed, although a new region > server is reported in, the master still pulls in the dead region server from > the zk. > {noformat} > 2013-05-12 18:32:52,996 INFO [IPC Server handler 6 on 36000] > org.apache.hadoop.hbase.master.ServerManager: Registering > server=a1217.halxg.cloudera.com,36020,1368408767773 > .... > 2013-05-12 18:32:54,653 INFO > [master-a1220.halxg.cloudera.com,36000,1368408767520] > org.apache.hadoop.hbase.master.HMaster: Registering server found up in zk but > who has not yet reported in: a1217.halxg.cloudera.com,36020,1368378273768 > 2013-05-12 18:32:54,653 INFO > [master-a1220.halxg.cloudera.com,36000,1368408767520] > org.apache.hadoop.hbase.master.ServerManager: Registering > server=a1217.halxg.cloudera.com,36020,1368378273768 > {noformat} > We should not pull in the second region server instance from zk. It is > actually dead. We can figure this out by the hostname, and the port. We can > assume no two region server instances can be alive on the same host, the same > port. To be more cautious, we can check the timestamp as well. The live one > should be that with the late timestamp, not pulled in from zk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira