[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799576#comment-13799576
]
Jonathan Hsieh edited comment on HBASE-5487 at 10/18/13 10:21 PM:
------------------------------------------------------------------
Yesterday, I shared with sergey and some of the folks interested this a draft
of the design I've been working on (I'll call it the hbck-master) and a list of
questions related to Sergey's design. Since sergey's has got master5 in the
name of the doc I'll refer to it as "master5". He's answered some question in
email but we should do technical discussions out here. We'll be working
together to hash out holes in each others designs and potentially merge
designs.
----
I have a lot of questions. I'll hit the big questions first. Also would i be
possible to put a version of this up as gdoc so we can point out nits and
places that need minor clarification? (I have a marked up physical copy
version of the doc, would be easier to provide feedback).
Main Concerns:
What is a failure and how do you react to failures? I think the master5 design
needs to spend more effort to considering failure and recovery cases. I claim
there are 4 types of responses from a networked IO operation - two states we
normally deal with ack successful, ack failed (nack) and unknown due to timeout
that succeeded (timeout success) and unknown due to timeout that failed
(timeout failed). We have historically missed the last two timeout cases or
assumed timeout means failure nack. It seems that master5 makes the same
assumptions.
I'm very concerned about what we need to do to invalidate information cached RS
information at clients in the case of hang, and that will violate the isolation
guarantees that we claim to provide. I really want a slice in-depth failure
handling case analysis including a client with cached rs assignments for move
and something more complicated such as split or alter.
I really want more invariant specified for the FSM states. e.g. if a region is
in state X, does it have a row in meta? does have data on the FS? is it open on
another region? is it open on only one region? I think having 8 pages of tables
at the back of the master5 doc can be more concise and precise which will help
us get attempt to prove correctness.
Clarification questions:
1) State update coordination. What is a "state updates from the outside" Do
RS's initiate splitting on their own? Maybe a picture would help so we can
figure out if it is similar or different from hbck-master's?
2) Single point of truth. What is this truth? what the user specficied
actions? what the rs's are reporting? the last state we were confirmed to be
at? hbck-master tries to define what single point of truth means by defining
intended, current, and actual state data with durability properties on each
kind. What do clients look at who modifies what?
3) Table record: "if regions is out of date, it should be closed and reopened".
It is not clear in master5 how regionservers find out that they are out of
date. Moreover, how do clients talking to those RS's with stale versions know
they are going to the correct RS especially in the face of RS failures due to
timeout?
4) region record: transition states. Shouldn't be defined as part of the
region record? (This is really similar to hbck-masters current state and
intended state. )
5) Note on user operations: the forgetting thing is scary to me -- in your move
split example, what happens if an RS reads state that is forgotten?
6) table state machine. how do we guarantee clients are not writing to against
out of date region versions? (in hang situations, regions could be open on
multple places -- the hung RS and the new RS the region was assigned to and
successfully opened on)
7) region state machine. Earlier draft hand splitting and merge cases. Are
they elided in master5 or are not present any more. How would this get extended
handle jeffrey's distributed log replay/fast write recovery feature?
8) logical interactions: sounds like master5 allows concurrent operations in
specfiic regions and and specfiic table. (e.g. it will allow moves and splits
and merges on the same region). hbck-master (though not fully documented) only
allows certain region transitions when the table is enabled or if the table is
disabled. Are we sure we don't get into race conditions? What happens if
disable gets issued -- its possible for someone to reopens the region and for
old clients to continue writing to it even though it is closed?
nit. 9) "in cursive" mean in italics. :)
10) The table operations section have tables which I believe are the actions
between FSM states in the table or region fsms. Is this correct? Can the
edges be labeled to describe which steps these transitions correspond to?
Short doc:
nit: Design Constraints, code should: Have AM logic isolated from the
persistent storage of state.
// I think this should be "abstracted" so we can plug in different
implementations of persistent storage of state.
was (Author: jmhsieh):
Yesterday, I shared with sergey and some of the folks interested this a draft
of the design I've been working on (I'll call it the hbck-master) and a list of
questions related to Sergey's design. Since sergey's has got master5 in the
name of the doc I'll refer to it as "master5". He's answered some question in
email but we should do technical discussions out here. We'll be working
together to hash out holes in each others designs and potentially merge
designs.
----
I have a lot of questions. I'll hit the big questions first. Also would i be
possible to put a version of this up as gdoc so we can point out nits and
places that need minor clarification? (I have a marked up physical copy
version of the doc, would be easier to provide feedback).
Main Concerns:
What is a failure and how do you react to failures? I think the master5 design
needs to spend more effort to considering failure and recovery cases. I claim
there are 4 types of responses from a networked IO operation - two states we
normally deal with ack successful, ack failed (nack) and unknown due to timeout
that succeeded (timeout success) and unknown due to timeout that failed
(timeout failed). We have historically missed the last two timeout cases or
assumed timeout means failure nack. It seems that master5 makes the same
assumptions.
I'm very concerned about what we need to do to invalidate information cached RS
information at clients in the case of hang, and that will violate the isolation
guarantees that we claim to provide. I really want a slice in-depth failure
handling case analysis including the master with cached rs assignments for move
and something more complicated such as split or alter.
I really want more invariant specified for the FSM states. e.g. if a region is
in state X, does it have a row in meta? does have data on the FS? is it open on
another region? is it open on only one region? I think having 8 pages of tables
at the back of the master5 doc can be more concise and precise which will help
us get attempt to prove correctness.
Clarification questions:
1) State update coordination. What is a "state updates from the outside" Do
RS's initiate splitting on their own? Maybe a picture would help so we can
figure out if it is similar or different from hbck-master's?
2) Single point of truth. What is this truth? what the user specficied
actions? what the rs's are reporting? the last state we were confirmed to be
at? hbck-master tries to define what single point of truth means by defining
intended, current, and actual state data with durability properties on each
kind. What do clients look at who modifies what?
3) Table record: "if regions is out of date, it should be closed and reopened".
It is not clear in master5 how regionservers find out that they are out of
date. Moreover, how do clients talking to those RS's with stale versions know
they are going to the correct RS especially in the face of RS failures due to
timeout?
4) region record: transition states. Shouldn't be defined as part of the
region record? (This is really similar to hbck-masters current state and
intended state. )
5) Note on user operations: the forgetting thing is scary to me -- in your move
split example, what happens if an RS reads state that is forgotten?
6) table state machine. how do we guarantee clients are not writing to against
out of date region versions? (in hang situations, regions could be open on
multple places -- the hung RS and the new RS the region was assigned to and
successfully opened on)
7) region state machine. Earlier draft hand splitting and merge cases. Are
they elided in master5 or are not present any more. How would this get extended
handle jeffrey's distributed log replay/fast write recovery feature?
8) logical interactions: sounds like master5 allows concurrent operations in
specfiic regions and and specfiic table. (e.g. it will allow moves and splits
and merges on the same region). hbck-master (though not fully documented) only
allows certain region transitions when the table is enabled or if the table is
disabled. Are we sure we don't get into race conditions? What happens if
disable gets issued -- its possible for someone to reopens the region and for
old clients to continue writing to it even though it is closed?
nit. 9) "in cursive" mean in italics. :)
10) The table operations section have tables which I believe are the actions
between FSM states in the table or region fsms. Is this correct? Can the
edges be labeled to describe which steps these transitions correspond to?
Short doc:
nit: Design Constraints, code should: Have AM logic isolated from the
persistent storage of state.
// I think this should be "abstracted" so we can plug in different
implementations of persistent storage of state.
> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
> Key: HBASE-5487
> URL: https://issues.apache.org/jira/browse/HBASE-5487
> Project: HBase
> Issue Type: New Feature
> Components: master, regionserver, Zookeeper
> Affects Versions: 0.94.0
> Reporter: Mubarak Seyed
> Assignee: Sergey Shelukhin
> Priority: Critical
> Attachments: Entity management in Master - part 1.pdf,
> hbckMasterV2-long.pdf, Region management in Master5.docx, Region management
> in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant
> manner.
> Master-coordinated tasks such as online-scheme change and delete-range
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core
> components
--
This message was sent by Atlassian JIRA
(v6.1#6144)