[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796362#comment-13796362
]
Enis Soztutar commented on HBASE-5487:
--------------------------------------
I also started a document some time ago, but never got to finish it to the
level of details I would like. However, I think we can agree on the design
goals section which I augmented from the discussion so far:
- Robust implementation
- Compressive test coverage by mocking server and region assignment states
(unit testable without MiniCluster and CM stuff)
- Bulk region operations
- Region operations should be isolated from server operations (AM vs SSH, log
splitting), and table operations (disabling / disabled table, schema changes,
etc) and cluster shutdown. AM and SSH should NEVER know about table state
(disable/disabling). Server liveness checks can only be done as an optimization
(servers can fail after the check is done)
- There should be one source of truth
- Should be compatible with master failover, and concurrent region
operations(split, RS failover, balancer, etc)
- AM should guarantee that a region can be hosted by a single region server at
any given time
- AM should be understandable by simple human beings like myself
- Actions for AM should be logged (possibly separately). We would like to be
able to construct the history for the regions from logs or some persisted
state.
- Assignment should be performant and parallelizable. We should target handling
millions of regions and thousands of servers. A single region assignment should
complete under 1 sec. (1PB data with 1 GB regions = 1M regions)
- No master abort when a region’s state cannot be determined. This results in
support cases where master cannot start, and without master things become even
worse. We should “quarantine” the regions if needed absolutely.
> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
> Key: HBASE-5487
> URL: https://issues.apache.org/jira/browse/HBASE-5487
> Project: HBase
> Issue Type: New Feature
> Components: master, regionserver, Zookeeper
> Affects Versions: 0.94.0
> Reporter: Mubarak Seyed
> Priority: Critical
> Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant
> manner.
> Master-coordinated tasks such as online-scheme change and delete-range
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core
> components
--
This message was sent by Atlassian JIRA
(v6.1#6144)