[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790109#comment-13790109
 ] 

Devaraj Das commented on HBASE-5487:
------------------------------------

bq. we still need a reliable store (ZK, system table, or master WAL). It seems 
ZK is the most scalable and best suited for the task

[~sershe], not ZK, IMHO. Let's use one of our internal storages rather than 
external system for storing the region state. I am all for removing ZK 
altogether from HBase. One less distributed system to worry about. One less 
component to manage. We already have heartbeats from RSs to master, and region 
open/close RPCs from master to the RSs. I think we have enough communication 
already in place between the master and RSs to deal with region states.... We 
also have chores in the master that tries to take some actions based on 
assignment timeouts... 

Would this model work (conceptually). It's late night here; please pardon me if 
there are glaring issues :-) Please bear with me :-)

All region state manipulation operations are initiated by the master and they 
act upon the meta region. We have extra columns to store the state of the 
region etc in the meta table. The initial rows are created by the master and 
the state of the regions are UNASSIGNED. This is not new - we already do this 
but IIRC we don't store the state of the region. Some state transitions happen 
through method executions and some of those method executions are RPCs from the 
master to some regionserver. I think that the states would be more granular 
here (to prevent potential replay/repetitions of large operations). I am 
wondering whether it makes sense to update the meta table from the various 
regionservers on the region state changes or go via the master.. But maybe the 
master doesn't need to be a bottleneck if possible. A regionserver could first 
update the meta table, and then just notify the master that a certain 
transition was done; the master could initiate the next transition ([~eclark] 
comment about coprocessor can probably be made to apply in this context). Only 
when a state change is recorded in meta, the operation is considered successful.

Also, there is a chore (probably enhance catalog-janitor) in the master that 
periodically goes over the meta table and restarts (along with some 
diagnostics; probing regionservers in question etc.) failed/stuck state 
transitions. This chore runs once as soon as the master is started and the meta 
region is assigned to take care of transitions that were started in the 
previous life of the master and which are now waiting for some action from the 
master. For example, if the state was OPENING for a certain region, and the 
master crashed, the master would send a openRegion RPC to the region assignee 
upon restart. The region assignee would have been recorded as a column in the 
row for the region by the previous master.

I think we should also save the operations that was initiated by the client on 
the master (either in WAL or in some system table) so that the master doesn't 
lose track of those and can execute them in the face of crashes & restarts. For 
example, if the user had sent a 'split region' operation and the master crashed.

> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>            Priority: Critical
>         Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to