[ 
https://issues.apache.org/jira/browse/HBASE-17733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895767#comment-15895767
 ] 

stack commented on HBASE-17733:
-------------------------------

[~Apache9] Good point on rolling upgrade. Master would have to keep its ears 
open for ephemeral node evaporation for a version or two. We should add a 
section on HOW to the design doc (smile).


> Undo registering regionservers in zk with ephemeral nodes; its more trouble 
> than its worth
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-17733
>                 URL: https://issues.apache.org/jira/browse/HBASE-17733
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: stack
>
> Elsewhere, we are undoing the use of ZK (replication current WAL offset, 
> regions-in-transition, etc).
> I have another case where using ZK, while convenient (call-backs), has holes.
> The scenario is prompted by review of HBASE-9593.
> Currently, a RS registers with the Master by calling the Master's 
> reportForDuty. After the Master responds with the name we are to use for 
> ourselves (as well as other properties we need to 'run'), we then turnaround 
> and do a new RPC out to the zk ensemble to register an ephemeral znode for 
> the RS.
> We notice a RS has gone away -- crashed -- because its znode evaporates and 
> the Master has a watcher triggered notifying it the RS has gone (after a zk 
> session timeout of tens of seconds).  Cumbersome (Setting watchers, zk 
> session timeouts) and indirect. Master then trips the server shutdown handler 
> which does reassign of regions from the crashed server.
> In HBASE-9593, we were trying to handle the rare but possible case where the 
> RS would die after registering w/ the Master but before we put up our 
> ephemeral znode. In this case a RS would live in the Master's internals 
> forever because there is no ephemeral znode to expire to do cleanup and 
> removal of the never-started RS.
> Lets get ZK out of the loop. Then only the Master and RS involved 
> heartbeating each other.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to