Jean-Daniel Cryans created HBASE-10271:
------------------------------------------

             Summary: [regression] Cannot use the wildcard address since 
HBASE-9593
                 Key: HBASE-10271
                 URL: https://issues.apache.org/jira/browse/HBASE-10271
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.96.1, 0.94.13
            Reporter: Jean-Daniel Cryans
            Priority: Critical


HBASE-9593 moved the creation of the ephemeral znode earlier in the region 
server startup process such that we don't have access to the ServerName from 
the Master's POV. HRS.getMyEphemeralNodePath() calls HRS.getServerName() which 
at that point will return this.isa.getHostName(). If you set 
hbase.regionserver.ipc.address to 0.0.0.0, you will create a znode with that 
address.

What happens next is that the RS will report for duty correctly but the master 
will do this:

{noformat}
2014-01-02 11:45:49,498 INFO  [master:172.21.3.117:60000] master.ServerManager: 
Registering server=0:0:0:0:0:0:0:0%0,60020,1388691892014
2014-01-02 11:45:49,498 INFO  [master:172.21.3.117:60000] master.HMaster: 
Registered server found up in zk but who has not yet reported in: 
0:0:0:0:0:0:0:0%0,60020,1388691892014
{noformat}

The cluster is then unusable.

I think a better solution is to track the heartbeats for the region servers and 
expire those that haven't checked-in for some time. The 0.89-fb branch has this 
concept, and they also use it to detect rack failures: 
https://github.com/apache/hbase/blob/0.89-fb/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java#L1224.
 In this jira's scope I would just add the heartbeat tracking and add a unit 
test for the wildcard address.

What do you think [~rajesh23]?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to