[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796362#comment-13796362
 ] 

Enis Soztutar commented on HBASE-5487:
--------------------------------------

I also started a document some time ago, but never got to finish it to the 
level of details I would like. However, I think we can agree on the design 
goals section which I augmented from the discussion so far:

- Robust implementation
- Compressive test coverage by mocking server and region assignment states 
(unit testable without MiniCluster and CM stuff)
- Bulk region operations
- Region operations should be isolated from server operations (AM vs SSH, log 
splitting), and table operations (disabling / disabled table, schema changes, 
etc) and cluster shutdown. AM and SSH should NEVER know about table state 
(disable/disabling). Server liveness checks can only be done as an optimization 
(servers can fail after the check is done)
- There should be one source of truth
- Should be compatible with master failover, and concurrent region 
operations(split, RS failover, balancer, etc)
- AM should guarantee that a region can be hosted by a single region server at 
any given time
- AM should be understandable by simple human beings like myself
- Actions for AM should be logged (possibly separately). We would like to be 
able to construct the history for the regions from logs or some persisted 
state. 
- Assignment should be performant and parallelizable. We should target handling 
millions of regions and thousands of servers. A single region assignment should 
complete under 1 sec. (1PB data with 1 GB regions  = 1M regions)
- No master abort when a region’s state cannot be determined. This results in 
support cases where master cannot start, and without master things become even 
worse. We should “quarantine” the regions if needed absolutely.  


> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>            Priority: Critical
>         Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to