[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617600#comment-13617600
]
Sergey Shelukhin commented on HBASE-5487:
-----------------------------------------
bq. Any major-overhaul solution should make sure that these operations, when
issued concurrently, interact according to a sane set of semantics in the face
of failures.
This is another (although not orthogonal) question.
I am looking for a sane way to define and enforce arbitrary semantics first.
Then sane semantics can be enforced on top of that :)
For example, in "actor-ish" model below would make it easy to write simple
code; persistent state would make sure there's definite state at any time, and
all crucial transitions are atomic, so semantics would be easy to enforce as
long as the code can handle a failed transition/recovery. Locks also make this
simple, although locks have other problems imho.
Although we can go both ways, if we define sane semantics it would be easy to
see how convenient they are to implement in a particular model.
bq. So I buy open/close as a region operation. split/merge are multi region
operations – is there enough state to recover from a failure?
There should be. Can you elaborate?
bq. So alter table is a region operation? Why isn't it in the state machine?
Alter table is currently the operation that involves region operation, namely
open/close. Open-close are in the state machine :) As for tables, I am not sure
state machine is the best model for table state, there isn't that much going on
with the table that is properly an exclusive state.
bq. Implementing region locks is too far – I'm asking for some back of the
napkin discussionb.
If a server holds a lock for a region for time Tlock during each day, and
number of regions is N probability of some region lock (or table read-only
lock) being held at any given time is (1-(1-(Tlock/Tday))^N), if I am writing
this correctly. For 5 seconds of locking per day per region, for 10000 regions
(not unreasonable for a large table/cluster) we will be holding some lock about
44% of the time for region operations.
Calculating the probability of any lock being in recovery (server went down
with a lock less than recovery time ago) can also be done, but numbers for some
parameters (how often do servers go down?) will be very speculative...
bq. I think we need some measurements how much throughput we can get in ZK or
with a ZK-lock implementation and compare his with # rs of watchers * # of
regions * number of ops...
Will there be many watchers/ops? You only watch and do ops when you acquire the
lock, so unless region operations are very frequent...
bq. The current regions-in-transition (RIT) code basically assumes that an
absent znode is either closed or opened. RIT znodes are present when the region
is in the inbetween states (opening, closing,
I don't think "either closed or opened" is good enough :) Also, RITs don't
cover all scenarios and things like table ops don't use them at all.
bq. I know I've suggested something like this before. Currently the RS
initiates a split, and does the region open/meta changes. If there are errors,
at some point the master side detects a timeout. An alternative would have
splits initiated RS on the rs but have the master do some kind of atomic
changes to meta and region state for the 3 involved regions (parent, daughter a
and daughter b).
Yeah, although in other models (locks, persistent state) that is not required.
Also if meta is cache for clients and not source of truth meta changes can
still be on the server; I assume by meta you mean global state, wherever that
is?
bq. We need to be careful about ZK – since it is a network connection also,
exceptions could be failures or timeouts (which succeed but wan't able to ack).
If we can describe the properties (durable vs erasable) and assumptions (if the
wipeable ZK is source of truth, how do we make sure the version state is
recoverable without time travel?)
The former applies to any distributed state; as for the latter - I was thinking
of ZK+"WAL" if we intend to keep ZK wipeable.
> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
> Key: HBASE-5487
> URL: https://issues.apache.org/jira/browse/HBASE-5487
> Project: HBase
> Issue Type: New Feature
> Components: master, regionserver, Zookeeper
> Affects Versions: 0.94.0
> Reporter: Mubarak Seyed
> Attachments: Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant
> manner.
> Master-coordinated tasks such as online-scheme change and delete-range
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core
> components
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira