[ 
https://issues.apache.org/jira/browse/HBASE-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656206#comment-13656206
 ] 

Jean-Daniel Cryans commented on HBASE-8537:
-------------------------------------------

I'm not sure [~jxiang], here's what happens when I test it locally:

{noformat}
2013-05-13 11:18:36,471 INFO org.apache.hadoop.hbase.master.ServerManager: 
Server serverName=172.21.3.117,60020,1368469116206 rejected; we already have 
172.21.3.117,60020,1368469063154 registered with same hostname and port
2013-05-13 11:18:36,471 INFO org.apache.hadoop.hbase.master.ServerManager: 
Triggering server recovery; existingServer 172.21.3.117,60020,1368469063154 
looks stale, new server:172.21.3.117,60020,1368469116206
2013-05-13 11:18:36,472 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
based on AM, current region=-ROOT-,,0.70236052 is on 
server=172.21.3.117,60020,1368469063154 server being checked: 
172.21.3.117,60020,1368469063154
2013-05-13 11:18:36,473 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
based on AM, current region=.META.,,1.1028785192 is on 
server=172.21.3.117,60020,1368469063154 server being checked: 
172.21.3.117,60020,1368469063154
2013-05-13 11:18:36,474 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Added=172.21.3.117,60020,1368469063154 to dead servers, submitted shutdown 
handler to be executed, root=true, meta=true
{noformat}

                
> Dead region server pulled in from ZK
> ------------------------------------
>
>                 Key: HBASE-8537
>                 URL: https://issues.apache.org/jira/browse/HBASE-8537
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.98.0
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>            Priority: Minor
>
> When a cluster restarts quickly after it's crashed, although a new region 
> server is reported in, the master still pulls in the dead region server from 
> the zk.
> {noformat}
> 2013-05-12 18:32:52,996 INFO  [IPC Server handler 6 on 36000] 
> org.apache.hadoop.hbase.master.ServerManager: Registering 
> server=a1217.halxg.cloudera.com,36020,1368408767773
> ....
> 2013-05-12 18:32:54,653 INFO  
> [master-a1220.halxg.cloudera.com,36000,1368408767520] 
> org.apache.hadoop.hbase.master.HMaster: Registering server found up in zk but 
> who has not yet reported in: a1217.halxg.cloudera.com,36020,1368378273768
> 2013-05-12 18:32:54,653 INFO  
> [master-a1220.halxg.cloudera.com,36000,1368408767520] 
> org.apache.hadoop.hbase.master.ServerManager: Registering 
> server=a1217.halxg.cloudera.com,36020,1368378273768
> {noformat}
> We should not pull in the second region server instance from zk.  It is 
> actually dead.  We can figure this out by the hostname, and the port.  We can 
> assume no two region server instances can be alive on the same host, the same 
> port.  To be more cautious, we can check the timestamp as well.  The live one 
> should be that with the late timestamp, not pulled in from zk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to