[ 
https://issues.apache.org/jira/browse/HBASE-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882266#action_12882266
 ] 

Jonathan Gray commented on HBASE-2700:
--------------------------------------

My first point above I think is critical.  ZK is really not proxy for actual 
state.  It's not to say that it 100% mirrors state but it ensures enough state 
to know what you need to know, if that makes sense.

For example, a M could die in the middle of load balancing.  Let's say that it 
sent out all the messages to RS to close regions, half of them actually 
finished closing and the master sent out opens to other RS for half of them.

When master fails over, it figures out who is in transition.  It would see all 
the regions that were being balanced as either OPENING, OPENED, CLOSING, or 
CLOSED.  I see no possibility where any region that is not properly assigned to 
a server would be missing from this.

We can then act upon those states.  If OPENED, well, we are done.  If we're 
responsible for meta updates, we would update meta and delete the node.  If 
OPENING, we would deal with it the same way we deal with a normal OPENING 
(possibly wait for some time and if nothing ever happens we try to assign 
elsewhere).  If CLOSING, same deal.  If CLOSED, that means the region was 
closed by the RS but that we are unsure whether the previous master actually 
did an assignment for it or not.  In this case, we generate a new assignment 
and assign it out to an RS.  That RS then needs to transition the CLOSED node 
to OPENING.  If the previous master actually had sent an open to someone, then 
they would also be attempting that transition from CLOSED to OPENING.  Only one 
will win and he will get the region.

> Handle master failover for regions in transition
> ------------------------------------------------
>
>                 Key: HBASE-2700
>                 URL: https://issues.apache.org/jira/browse/HBASE-2700
>             Project: HBase
>          Issue Type: Sub-task
>          Components: master, zookeeper
>            Reporter: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.21.0
>
>
> To this point in HBASE-2692 tasks we have moved everything for regions in 
> transition into ZK, but we have not fully handled the master failover case.  
> This is to deal with that and to write tests for it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to