[
https://issues.apache.org/jira/browse/HBASE-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882266#action_12882266
]
Jonathan Gray commented on HBASE-2700:
--------------------------------------
My first point above I think is critical. ZK is really not proxy for actual
state. It's not to say that it 100% mirrors state but it ensures enough state
to know what you need to know, if that makes sense.
For example, a M could die in the middle of load balancing. Let's say that it
sent out all the messages to RS to close regions, half of them actually
finished closing and the master sent out opens to other RS for half of them.
When master fails over, it figures out who is in transition. It would see all
the regions that were being balanced as either OPENING, OPENED, CLOSING, or
CLOSED. I see no possibility where any region that is not properly assigned to
a server would be missing from this.
We can then act upon those states. If OPENED, well, we are done. If we're
responsible for meta updates, we would update meta and delete the node. If
OPENING, we would deal with it the same way we deal with a normal OPENING
(possibly wait for some time and if nothing ever happens we try to assign
elsewhere). If CLOSING, same deal. If CLOSED, that means the region was
closed by the RS but that we are unsure whether the previous master actually
did an assignment for it or not. In this case, we generate a new assignment
and assign it out to an RS. That RS then needs to transition the CLOSED node
to OPENING. If the previous master actually had sent an open to someone, then
they would also be attempting that transition from CLOSED to OPENING. Only one
will win and he will get the region.
> Handle master failover for regions in transition
> ------------------------------------------------
>
> Key: HBASE-2700
> URL: https://issues.apache.org/jira/browse/HBASE-2700
> Project: HBase
> Issue Type: Sub-task
> Components: master, zookeeper
> Reporter: Jonathan Gray
> Priority: Critical
> Fix For: 0.21.0
>
>
> To this point in HBASE-2692 tasks we have moved everything for regions in
> transition into ZK, but we have not fully handled the master failover case.
> This is to deal with that and to write tests for it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.