[ 
https://issues.apache.org/jira/browse/HBASE-17733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895766#comment-15895766
 ] 

Sean Busbey commented on HBASE-17733:
-------------------------------------

{quote}
In HBASE-9593, we were trying to handle the rare but possible case where the RS 
would die after registering w/ the Master but before we put up our ephemeral 
znode. In this case a RS would live in the Master's internals forever because 
there is no ephemeral znode to expire to do cleanup and removal of the 
never-started RS.
{quote}

For example, why not just solve this problem by having the Master watch for new 
ephemeral znodes and only add the RS to its internals at that point? Just 
because we need to ask the master what name it sees for us doesn't mean we have 
to use that request as the time to consider the RS fully bootstrapped.

> Undo registering regionservers in zk with ephemeral nodes; its more trouble 
> than its worth
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-17733
>                 URL: https://issues.apache.org/jira/browse/HBASE-17733
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: stack
>
> Elsewhere, we are undoing the use of ZK (replication current WAL offset, 
> regions-in-transition, etc).
> I have another case where using ZK, while convenient (call-backs), has holes.
> The scenario is prompted by review of HBASE-9593.
> Currently, a RS registers with the Master by calling the Master's 
> reportForDuty. After the Master responds with the name we are to use for 
> ourselves (as well as other properties we need to 'run'), we then turnaround 
> and do a new RPC out to the zk ensemble to register an ephemeral znode for 
> the RS.
> We notice a RS has gone away -- crashed -- because its znode evaporates and 
> the Master has a watcher triggered notifying it the RS has gone (after a zk 
> session timeout of tens of seconds).  Cumbersome (Setting watchers, zk 
> session timeouts) and indirect. Master then trips the server shutdown handler 
> which does reassign of regions from the crashed server.
> In HBASE-9593, we were trying to handle the rare but possible case where the 
> RS would die after registering w/ the Master but before we put up our 
> ephemeral znode. In this case a RS would live in the Master's internals 
> forever because there is no ephemeral znode to expire to do cleanup and 
> removal of the never-started RS.
> Lets get ZK out of the loop. Then only the Master and RS involved 
> heartbeating each other.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to