[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

Sergey Shelukhin (JIRA) Wed, 27 Mar 2013 16:57:16 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615901#comment-13615901
 ]


Sergey Shelukhin commented on HBASE-5487:
-----------------------------------------

bq. Is the assignment manager the only "master coordinated" task in scope?
Only for the current document version... tables could be added.
bq. Instead of asserting it is not clear if table (+region) locks scale, let's 
find out.
Hmm... that would require implementing region locks, and having a very large 
cluster. I am talking more about unacceptable blocking of user operations, and 
management of expiring locks in presense of real-life failures.
bq. Master operations and processes can clash and we should understand where we 
need concurrency control. (I'm working on a table – here's an draft distilled 
version [1], there exists an overly detailed version that I'll share once i get 
it fixed)
Comments below.
bq. Should there be a notion of queuing operations? (locking, or an actual 
queue) Should these operations be generically logged so they can complete if a 
master goes down in the middle? (ex: master goes down during a "move" operation 
after the close but before the open on the new rs).
You mean like WAL for operations?

bq. The "design principles" is actually more of a proposed design.
Yeah, sorry, wanted to split it into two sections but never did. Will rename.

bq. how do we deal with operations where we need "locks" on multiple region 
because we are reading or modifying multiple regions – e.g. splits, merges, 
snapshots? Matteo Bertozzi had suggested in another jira making a the meta row 
per table, or maybe part of the solution is using the multi-row single meta 
region transaction.
Depends on where we store it, but yeah these have to be transactional. Last 
section (very short :)) suggests using ZK, which already supports that.


bq. What are alternatives? why this approach vs others?
I can expand the doc... the implicitly mentioned existing alternatives are 
locks, which I would argue scale less and are harder to manage; or transaction 
approach that is currently used (although not unified), for example via 
transient transaction nodes.

Actually, one alternative approach I saw used for such things is to simplify 
concurrency of operations/etc. with actor-like model, where master has logical 
cluster state and previously saved target state, and periodically (often) takes 
an epic lock, looks at them quickly, and based on what it is doing, outputs new 
target cluster state and a list of physical things to do Then it releases epic 
lock, and the new target state is saved, and operations performed.
That way all state-management code becomes simple, because it runs in one place 
with no concurrency, and recovery just has to compare real cluster state with 
destination state. 
But this will require thinking about this differently. 
Also usually that would mean RSes won't be able to initiate operations (like 
split) - they will have to go thru master (which I would argue is ok).
Also it's not clear whether this will become too much of a bottleneck.


bq. Where do you think the new information will be, META table?
It seems to me that ZK would be better (see last section), but META is also an 
option.

>From the spreadsheet:
bq. Enabling and disabling table operations should be blocked when  any of 
these simple region operations are in progress
Not clear why (logically).
bq. move
Move is close and open, doesn't require consistency, right?
bq. Regionserver Processes ... However, the individual operations must  
maintain the table integrity property.
Not clear what this means for snapshots.
  
                
> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>         Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

Reply via email to