[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

Feng Honghua (JIRA) Thu, 10 Oct 2013 07:13:12 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791516#comment-13791516
 ]


Feng Honghua commented on HBASE-5487:
-------------------------------------

bq.Master is the Actor. Having it go across a network to get/set the 'state' in 
a service that is non-transactional wasn't our smartest move.
Regionservers currently report state via ZK. Master reads it from ZK. Would be 
better if RS just reported directly to RS.
[~stack] Yes, this is exactly what I proposed in HBASE-9726 :-)

bq.I am wondering whether it makes sense to update the meta table from the 
various regionservers on the region state changes or go via the master.. But 
maybe the master doesn't need to be a bottleneck if possible. A regionserver 
could first update the meta table, and then just notify the master that a 
certain transition was done; the master could initiate the next transition
[~devaraj] It would be better to let master updates the meta table rather than 
let various regionservers do it. Master being the single actor and 
truth-maintainer can avoid many tricky bugs/problems. And for frequent state 
changes to the meta table, the regionserver serving the (state) meta table 
would be sooner the bottleneck than master which issues the update requests, so 
whether it doesn't matter the update requests are from the master or from 
various regionservers.

bq.I prefer not to use ZK since it's kind of the root cause of uncertainty: has 
the master/region server got/processed the event? has the znode hijacked since 
master/region server changes its mind?
We should store the state in meta table which is cached in the memory.
Whether to use coprocessor it is not a big concern to me. If we don't use 
coprocessor, I prefer to use the master as the proxy to do all meta table 
updates. Otherwise, we need to listen to something for updates.
[~jxiang] Agree. IMO ZK alone is not the root cause of uncertainty, the current 
usage pattern of ZK is the root cause, the pattern that regionserver updates 
state in ZK and master listens to the ZK and updates states in its local memory 
accordingly exhibits too many tricky scenarios/bugs due to ZK watch is 
one-time(which can result in missed state transition) and the 
notification/process is asyncronous(which can lead to 
delayed/non-update-to-date state in master memory). And by replacing ZK with 
meta table, we also need to discard this 'RS updates - master listen' pattern 
since meta table inherently lack listen-notify mechanism:-).

bq.I think ZK got a bad reputation not on its own merit, but on how we use it.
I can see that problems exist but IMHO advantages outweigh the disadvantages 
compared to system table.
Co-located system table, I am not so sure, but so far there's no even 
high-level design for this (for example - do all splits have to go thru 
master/system table now? how does it recover? etc.).
Perhaps we should abstract an async persistence mechanism sufficiently and then 
decide. Whether it would be ZK+notifications, or system table, or memory + wal, 
or colocated system table, or what.
The problem is that the usage inside master of that interface would depend on 
perf characteristics.
Anyway, we can work out the state transitions/concurrency/recovery without 
tying 100% to particular store.
[~sershe] Agree on "ZK got a bad reputation not on its own merit, but on how we 
use it.", especially if you mean currently master relies on ZK 
watch/notification to maintain/update master's in-memory region state. IMO this 
is almost the biggest root cause of current assignment design. If we just uses 
ZK the same way as using meta table to storing states, it makes no that big 
difference to store the states in ZK or meta table, right(except using meta 
table can have much better performance for restart of a big cluster with large 
amount of regions)? But using ZK's update/listen pattern does make the 
difference.

bq.btw, any input on actor model? 
Things queue up operations/notifications ("ops") for master; "AM" runs on timer 
or when queue is non-empty, having as inputs, cluster state (incl. ongoing 
internal actions it ordered before e.g. OPENING state for a region) plus new 
ops from queue, on a single thread; generates new actions (not physically doing 
anything e,g, talking to RS); the ops state and cluster state is persisted; 
then actions are executed on different threads (e.g. messages sent to RS-es, 
etc.), and "AM" runs again, or sleeps for some time if ops queue is empty.
That is a different model, not sure if it scales for large clusters.
[~sershe] "operations/notifications" means RS responses action progress to 
master? Master is the single point to update the state "truth"(to meta table) 
and RS doesn't know where the states are stored and doesn't access them 
directly, right? I think a communication/storage diagram can help a lot for an 
overall clear understanding here:-)

> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>            Priority: Critical
>         Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

Reply via email to