[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793436#comment-13793436
]
Jimmy Xiang commented on HBASE-5487:
------------------------------------
[~fenghh], to the uncertainty due to ZK, I don't think it is because the way
how we use it. It is more because ZK doesn't support continuous events. You
have to set the watch again after each event callback. The problem is that
after an event is triggered, when we try to get the data, the data could be
changed again so an event is missed that will cause state jump.
Currently, we do have a region state machine. However, the machine is not
strict due to the ZK thing. We could jump over some state, which make the
state transition machine can't be strictly enforced. If we go without ZK, we
can have a strict state machine to follow. That will make things much
predictable.
[~sershe], to the janitor, I think we don't need it. Currently, we have a
timeout monitor. But it is disabled and will be removed soon I think. Without
the monitor, ITBLL with CM runs very well. With 0.96 tip, I tried to run ITBLL
with CM with aggressive region moving, and it is perfectly fine. If a RS is
gone, SSH should handle it properly and assign regions. If there is a janitor,
it will compete with SSH in this case, which probably does more harm than good.
To make some RS to serve the role of master, besides we can have meta on it, we
can have some (not all, of course, to make [~jesse_yates] happy :) ) system
tables on it too. This way, we can support level region assignments, i.e. we
can open some regions before the rest, if these regions can be assigned to the
master RS, or we can open on this master RS at first, then move away later
after system is fully started. This applies to some special regions only for
sure.
Now, we bundle two import modules (master + meta) in one RS. It is critical to
make sure it has light load, not die too often (even better, not die at all).
So I think we should move other regions out of the RS once it's promoted to be
the master one.
I think we should allow only a list of RS with good hardware to be master, if
not all RS nodes have decent/same hardware.
> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
> Key: HBASE-5487
> URL: https://issues.apache.org/jira/browse/HBASE-5487
> Project: HBase
> Issue Type: New Feature
> Components: master, regionserver, Zookeeper
> Affects Versions: 0.94.0
> Reporter: Mubarak Seyed
> Priority: Critical
> Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant
> manner.
> Master-coordinated tasks such as online-scheme change and delete-range
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core
> components
--
This message was sent by Atlassian JIRA
(v6.1#6144)