[ 
https://issues.apache.org/jira/browse/HBASE-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-8537:
-------------------------------

    Status: Patch Available  (was: Open)

Attached a patch for trunk: when a region server is pulled in from zk, we added 
some check to make sure there is not already a region server with the same host 
and port registered.

When the rs pulled in from zk has an old timestamp, it is rejected.  Otherwise, 
it is ignored and should report itself in (could be a race).
                
> Dead region server pulled in from ZK
> ------------------------------------
>
>                 Key: HBASE-8537
>                 URL: https://issues.apache.org/jira/browse/HBASE-8537
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.98.0
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>            Priority: Minor
>         Attachments: trunk-8537.patch
>
>
> When a cluster restarts quickly after it's crashed, although a new region 
> server is reported in, the master still pulls in the dead region server from 
> the zk.
> {noformat}
> 2013-05-12 18:32:52,996 INFO  [IPC Server handler 6 on 36000] 
> org.apache.hadoop.hbase.master.ServerManager: Registering 
> server=a1217.halxg.cloudera.com,36020,1368408767773
> ....
> 2013-05-12 18:32:54,653 INFO  
> [master-a1220.halxg.cloudera.com,36000,1368408767520] 
> org.apache.hadoop.hbase.master.HMaster: Registering server found up in zk but 
> who has not yet reported in: a1217.halxg.cloudera.com,36020,1368378273768
> 2013-05-12 18:32:54,653 INFO  
> [master-a1220.halxg.cloudera.com,36000,1368408767520] 
> org.apache.hadoop.hbase.master.ServerManager: Registering 
> server=a1217.halxg.cloudera.com,36020,1368378273768
> {noformat}
> We should not pull in the second region server instance from zk.  It is 
> actually dead.  We can figure this out by the hostname, and the port.  We can 
> assume no two region server instances can be alive on the same host, the same 
> port.  To be more cautious, we can check the timestamp as well.  The live one 
> should be that with the late timestamp, not pulled in from zk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to