[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794862#comment-13794862
 ] 

Eric Newton commented on HBASE-5487:
------------------------------------

Accumulo does manage tablet (region) assignment tracking through the metadata 
table, and further, uses a distributed state machine to scale up a little 
beyond a single master node. I have been meaning to write it up, but I've not 
had a chance.

I've not kept up with every HBase improvement, so I don't know if it is 
pertinent... the accumulo metadata table is typically spread out over 50 - 100% 
of the available tablet (region) servers.

Still, the metadata table, and especially the root table(t), is subject to 
hot-spotting on large map/reduce jobs where hundreds (or thousands) of clients 
are learning tablet locations at the same time.  Block caching is important, 
but at some point massive numbers of simultaneous RPC requests to a single node 
cause delays, or even timeouts and failures.

But using accumulo to store accumulo state has scaled well.

Accumulo has 2 frameworks for master tasks:

* master general state processing: a table should be online, assignments are 
recorded and servers repeatedly informed
* FATE processing, where multi-stage operations are saved, executed and 
progress is re-recorded

The first is general maintenance: keeping the system running.  Tablets are 
assigned, unassigned and in-general balanced.

The second allows for temporal deviance: tablets are taken offline for a merge, 
for example.  The step-by-step allocation of resources and state are walked, 
each step recording progress in zookeeper.



> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>            Priority: Critical
>         Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to