[
https://issues.apache.org/jira/browse/HBASE-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861017#comment-13861017
]
Jean-Daniel Cryans commented on HBASE-10271:
--------------------------------------------
bq. If suggested solution is implemented (tracking and expiring based on
heartbeats), do we need ZK RS lease at all?
I don't remember the whole discussion but FB had a reason to rely on both the
ZK timeout and the heartbeat timeout for rack failures. It might be in a
presentation somewhere online.
> [regression] Cannot use the wildcard address since HBASE-9593
> -------------------------------------------------------------
>
> Key: HBASE-10271
> URL: https://issues.apache.org/jira/browse/HBASE-10271
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.13, 0.96.1
> Reporter: Jean-Daniel Cryans
> Priority: Critical
> Fix For: 0.94.16
>
>
> HBASE-9593 moved the creation of the ephemeral znode earlier in the region
> server startup process such that we don't have access to the ServerName from
> the Master's POV. HRS.getMyEphemeralNodePath() calls HRS.getServerName()
> which at that point will return this.isa.getHostName(). If you set
> hbase.regionserver.ipc.address to 0.0.0.0, you will create a znode with that
> address.
> What happens next is that the RS will report for duty correctly but the
> master will do this:
> {noformat}
> 2014-01-02 11:45:49,498 INFO [master:172.21.3.117:60000]
> master.ServerManager: Registering server=0:0:0:0:0:0:0:0%0,60020,1388691892014
> 2014-01-02 11:45:49,498 INFO [master:172.21.3.117:60000] master.HMaster:
> Registered server found up in zk but who has not yet reported in:
> 0:0:0:0:0:0:0:0%0,60020,1388691892014
> {noformat}
> The cluster is then unusable.
> I think a better solution is to track the heartbeats for the region servers
> and expire those that haven't checked-in for some time. The 0.89-fb branch
> has this concept, and they also use it to detect rack failures:
> https://github.com/apache/hbase/blob/0.89-fb/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java#L1224.
> In this jira's scope I would just add the heartbeat tracking and add a unit
> test for the wildcard address.
> What do you think [~rajesh23]?
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)