[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

Sergey Shelukhin (JIRA) Fri, 29 Mar 2013 11:41:18 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617600#comment-13617600
 ]


Sergey Shelukhin commented on HBASE-5487:
-----------------------------------------

bq. Any major-overhaul solution should make sure that these operations, when 
issued concurrently, interact according to a sane set of semantics in the face 
of failures.
This is another (although not orthogonal) question.
I am looking for a sane way to define and enforce arbitrary semantics first. 
Then sane semantics can be enforced on top of that :)
For example, in "actor-ish" model below would make it easy to write simple 
code; persistent state would make sure there's definite state at any time, and 
all crucial transitions are atomic, so semantics would be easy to enforce as 
long as the code can handle a failed transition/recovery. Locks also make this 
simple, although locks have other problems imho.
Although we can go both ways, if we define sane semantics it would be easy to 
see how convenient they are to implement in a particular model.

bq. So I buy open/close as a region operation. split/merge are multi region 
operations – is there enough state to recover from a failure?
There should be. Can you elaborate?

bq. So alter table is a region operation? Why isn't it in the state machine?
Alter table is currently the operation that involves region operation, namely 
open/close. Open-close are in the state machine :) As for tables, I am not sure 
state machine is the best model for table state, there isn't that much going on 
with the table that is properly an exclusive state.

bq. Implementing region locks is too far – I'm asking for some back of the 
napkin discussionb.
If a server holds a lock for a region for time Tlock during each day, and 
number of regions is N probability of some region lock (or table read-only 
lock) being held at any given time is (1-(1-(Tlock/Tday))^N), if I am writing 
this correctly. For 5 seconds of locking per day per region, for 10000 regions 
(not unreasonable for a large table/cluster) we will be holding some lock about 
44% of the time for region operations.
Calculating the probability of any lock being in recovery (server went down 
with a lock less than recovery time ago) can also be done, but numbers for some 
parameters (how often do servers go down?) will be very speculative...

bq. I think we need some measurements how much throughput we can get in ZK or 
with a ZK-lock implementation and compare his with # rs of watchers * # of 
regions * number of ops...
Will there be many watchers/ops? You only watch and do ops when you acquire the 
lock, so unless region operations are very frequent... 

bq. The current regions-in-transition (RIT) code basically assumes that an 
absent znode is either closed or opened. RIT znodes are present when the region 
is in the inbetween states (opening, closing,
I don't think "either closed or opened" is good enough :) Also, RITs don't 
cover all scenarios and things like table ops don't use them at all.

bq. I know I've suggested something like this before. Currently the RS 
initiates a split, and does the region open/meta changes. If there are errors, 
at some point the master side detects a timeout. An alternative would have 
splits initiated RS on the rs but have the master do some kind of atomic 
changes to meta and region state for the 3 involved regions (parent, daughter a 
and daughter b).
Yeah, although in other models (locks, persistent state) that is not required. 
Also if meta is cache for clients and not source of truth meta changes can 
still be on the server; I assume by meta you mean global state, wherever that 
is?

bq. We need to be careful about ZK – since it is a network connection also, 
exceptions could be failures or timeouts (which succeed but wan't able to ack). 
If we can describe the properties (durable vs erasable) and assumptions (if the 
wipeable ZK is source of truth, how do we make sure the version state is 
recoverable without time travel?)
The former applies to any distributed state; as for the latter - I was thinking 
of ZK+"WAL" if we intend to keep ZK wipeable.

                
> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>         Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

Reply via email to