[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791520#comment-13791520
 ] 

Feng Honghua commented on HBASE-5487:
-------------------------------------

Since HBASE-9726 is closed as duplicated with this one, I copied the proposal 
of HBASE-9726 here for discussion/reference:

Current assignment process (also split process) relies on ZK for the 
communication between master and regionserver. This pattern has two drawbacks: 
  1. For cluster with big number of regions(say, 10K-100K regions), ZK becomes 
the bottleneck for cluster restart since the assignment/split status/progress 
is stored in ZK due to ZK's limited write throughput 
  2. Since ZK's watch is one-time and the event notification/process is 
asynchronous, there is no guarantee for master(the watcher) to be notified of 
the up-to-date status/progress in time, thereby master relies on idempotence 
for its correctness, which makes the logic/code very hard to 
understand/maintain 

A new assignment design proposal is as below: 
  1. Assignment/split status/progress is stored in a system table(say 
'assignTable') as meta table rather than ZK to improve the write throughput, 
hence to improve the proformance of restart for cluster with large number of 
regions. 
  2. The communication pattern for assignment/split is changed this way: master 
talks directly with regionserver(master issues assign request to regionserver, 
regionserver responses the assign progress to master) and records the 
status/progress of each assignment/split in the 'assignTable', in case of 
master failure, new active master reads the 'assignTable' to rebuilds the 
knowledge of the ongoing assignmeng/split tasks and continues from that 
knowledge. (regionserver doesn't write to the 'assignTable') 

> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>            Priority: Critical
>         Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to