[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616129#comment-13616129
 ] 

Jonathan Hsieh commented on HBASE-5487:
---------------------------------------

To do a major overhaul, we need something stronger than "the code is hard to 
read".  I agree that it is hard to follow (see: 
http://people.apache.org/~jmhsieh/hbase/120905-hbase-assignment.pdf) but it 
seems to be basically working which is a pretty strong argument.  Let's compare 
and point out what is wrong/broken in the current implementation and how the 
new design won't have those problems.  

The spreadsheet link is my first step to enumerating semantics and distilling 
the set of possible problems and things that are being guarded from races.  Any 
major-overhaul solution should make sure that these operations, when issued 
concurrently, interact according to a sane set of semantics in the face of 
failures.

bq. Only for the current document version... tables could be added

So I buy open/close as a region operation.  split/merge are multi region 
operations -- is there enough state to recover from a failure?

So alter table is a region operation? Why isn't it in the state machine? 

bq. Hmm... that would require implementing region locks, and having a very 
large cluster. I am talking more about unacceptable blocking of user 
operations, and management of expiring locks in presense of real-life failures.

Implementing region locks is too far -- I'm asking for some back of the napkin 
discussionb.  I think we need  some measurements how much throughput we can get 
in ZK or with a ZK-lock implementation and compare his with # rs of watchers * 
# of regions * number of ops..

The current regions-in-transition (RIT) code basically assumes that an absent 
znode is either closed or opened.  RIT znodes are present when the region is in 
the inbetween states (opening, closing, 

bq. You mean like WAL for operations?

Yeah, we could call it an "intent" log.  It would have info so that a promoted 
backup master can look in one place and complete an operation started by the 
downed original master.

bq. ... Also usually that would mean RSes won't be able to initiate operations 
(like split) - they will have to go thru master (which I would argue is ok).

I know I've suggested something like this before.  Currently the RS initiates a 
split, and does the region open/meta changes.  If there are errors, at some 
point the master side detects a timeout.  An alternative would have splits 
initiated RS on the rs but have the master do some kind of atomic changes to 
meta and region state for the 3 involved regions (parent, daughter a and 
daughter b).  

bq. Depends on where we store it, but yeah these have to be transactional. Last 
section (very short ) suggests using ZK, which already supports that.

We need to be careful about ZK -- since it is a network connection also, 
exceptions could be failures or timeouts (which succeed but wan't able to ack). 
 If we can describe the properties (durable vs erasable) and assumptions (if 
the wipeable ZK is source of truth, how do we make sure the version state is 
recoverable without time travel?)

                
> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>         Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to