[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793436#comment-13793436
 ] 

Jimmy Xiang commented on HBASE-5487:
------------------------------------

[~fenghh], to the uncertainty due to ZK, I don't think it is because the way 
how we use it.  It is more because ZK doesn't support continuous events.  You 
have to set the watch again after each event callback.  The problem is that 
after an event is triggered, when we try to get the data, the data could be 
changed again so an event is missed that will cause state jump.

Currently, we do have a region state machine.  However, the machine is not 
strict due to the ZK thing.  We could jump over some state, which make the 
state transition machine can't be strictly enforced.  If we go without ZK, we 
can have a strict state machine to follow. That will make things much 
predictable.

[~sershe], to the janitor, I think we don't need it.  Currently, we have a 
timeout monitor.  But it is disabled and will be removed soon I think.  Without 
the monitor, ITBLL with CM runs very well. With 0.96 tip, I tried to run ITBLL 
with CM with aggressive region moving, and it is perfectly fine. If a RS is 
gone, SSH should handle it properly and assign regions.  If there is a janitor, 
it will compete with SSH in this case, which probably does more harm than good.

To make some RS to serve the role of master, besides we can have meta on it, we 
can have some (not all, of course, to make [~jesse_yates] happy :) ) system 
tables on it too. This way, we can support level region assignments, i.e. we 
can open some regions before the rest, if these regions can be assigned to the 
master RS, or we can open on this master RS at first, then move away later 
after system is fully started. This applies to some special regions only for 
sure.

Now, we bundle two import modules (master + meta) in one RS. It is critical to 
make sure it has light load, not die too often (even better, not die at all). 
So I think we should move other regions out of the RS once it's promoted to be 
the master one.

I think we should allow only a list of RS with good hardware to be master, if 
not all RS nodes have decent/same hardware.


> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>            Priority: Critical
>         Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to