[jira] [Comment Edited] (HBASE-5487) Generic framework for Master-coordinated tasks

Jonathan Hsieh (JIRA) Fri, 18 Oct 2013 15:22:38 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799576#comment-13799576
 ]


Jonathan Hsieh edited comment on HBASE-5487 at 10/18/13 10:21 PM:
------------------------------------------------------------------

Yesterday, I shared with sergey and some of the folks interested this a draft 
of the design I've been working on (I'll call it the hbck-master) and a list of 
questions related to Sergey's design.  Since sergey's has got master5 in the 
name of the doc I'll refer to it as "master5".  He's answered some question in 
email but we should do technical discussions out here.  We'll be working 
together to hash out holes in each others designs and potentially merge 
designs.  

----

I have a lot of questions.  I'll hit the big questions first. Also would i be 
possible to put a version of this up as gdoc so we can point out nits and 
places that need minor clarification?   (I have a marked up physical copy 
version of the doc, would be easier to provide feedback).

Main Concerns:

What is a failure and how do you react to failures? I think the master5 design 
needs to spend more effort  to considering failure and recovery cases. I claim 
there are 4 types of responses from a networked IO operation -  two states we 
normally deal with ack successful, ack failed (nack) and unknown due to timeout 
 that succeeded (timeout success) and unknown due to timeout that failed 
(timeout failed). We have historically missed the last two timeout cases or 
assumed timeout means failure nack. It seems that master5 makes the same 
assumptions. 

I'm very concerned about what we need to do to invalidate information cached RS 
information at clients in the case of hang, and that will violate the isolation 
guarantees that we claim to provide.  I really want a slice in-depth failure 
handling case analysis including a client with cached rs assignments for move 
and something more complicated such as split or alter.

I really want more invariant specified for the FSM states.  e.g. if a region is 
in state X, does it have a row in meta? does have data on the FS? is it open on 
another region? is it open on only one region? I think having 8 pages of tables 
at the back of the master5 doc can be more concise and precise which will help 
us get attempt to prove correctness.  

Clarification questions:

1)  State update coordination.  What is a "state updates from the outside"  Do 
RS's initiate splitting on their own?  Maybe a picture would help so we can 
figure out if it is similar or different from hbck-master's?

2) Single point of truth.  What is this truth? what the user specficied 
actions?  what the rs's are reporting?  the last state we were confirmed to be 
at? hbck-master tries to define what single point of truth means by defining 
intended, current, and actual state data with durability properties on each 
kind. What do clients look at who modifies what? 

3) Table record: "if regions is out of date, it should be closed and reopened". 
It is not clear in master5 how regionservers find out that they are out of 
date. Moreover, how do clients talking to those RS's with stale versions know 
they are going to the correct RS especially in the face of RS failures due to 
timeout?

4) region record: transition states.  Shouldn't be defined as part of the 
region record? (This is really similar to hbck-masters current state and 
intended state. )

5) Note on user operations: the forgetting thing is scary to me -- in your move 
split example, what happens if an RS reads state that is forgotten?  

6) table state machine. how do we guarantee clients are not writing to against 
out of date region versions? (in hang situations, regions could be open on 
multple places -- the hung RS and the new RS the region was assigned to and 
successfully opened on)  

7) region state machine.  Earlier draft hand splitting and merge cases.  Are 
they elided in master5 or are not present any more. How would this get extended 
handle jeffrey's distributed log replay/fast write recovery feature?  

8) logical interactions:  sounds like master5 allows concurrent operations in 
specfiic regions and and specfiic table.  (e.g. it will allow moves and splits 
and merges on the same region).  hbck-master (though not fully documented) only 
allows certain region transitions when the table is enabled or if the table is 
disabled.  Are we sure we don't get into race conditions?  What happens if 
disable gets issued -- its possible for someone to reopens the region and for 
old clients to continue writing to it even though it is closed?

nit. 9) "in cursive" mean in italics. :) 

10) The table operations section have tables which I believe are the actions 
between FSM states in the table or region fsms.  Is this correct?  Can the 
edges be labeled to describe which steps these transitions correspond to?

Short doc:
nit: Design Constraints, code should: Have AM logic isolated from the 
persistent storage of state.  
// I think this should be "abstracted" so we can plug in different 
implementations of persistent storage of state.


was (Author: jmhsieh):
Yesterday, I shared with sergey and some of the folks interested this a draft 
of the design I've been working on (I'll call it the hbck-master) and a list of 
questions related to Sergey's design.  Since sergey's has got master5 in the 
name of the doc I'll refer to it as "master5".  He's answered some question in 
email but we should do technical discussions out here.  We'll be working 
together to hash out holes in each others designs and potentially merge 
designs.  

----

I have a lot of questions.  I'll hit the big questions first. Also would i be 
possible to put a version of this up as gdoc so we can point out nits and 
places that need minor clarification?   (I have a marked up physical copy 
version of the doc, would be easier to provide feedback).

Main Concerns:

What is a failure and how do you react to failures? I think the master5 design 
needs to spend more effort  to considering failure and recovery cases. I claim 
there are 4 types of responses from a networked IO operation -  two states we 
normally deal with ack successful, ack failed (nack) and unknown due to timeout 
 that succeeded (timeout success) and unknown due to timeout that failed 
(timeout failed). We have historically missed the last two timeout cases or 
assumed timeout means failure nack. It seems that master5 makes the same 
assumptions. 

I'm very concerned about what we need to do to invalidate information cached RS 
information at clients in the case of hang, and that will violate the isolation 
guarantees that we claim to provide.  I really want a slice in-depth failure 
handling case analysis including the master with cached rs assignments for move 
and something more complicated such as split or alter.

I really want more invariant specified for the FSM states.  e.g. if a region is 
in state X, does it have a row in meta? does have data on the FS? is it open on 
another region? is it open on only one region? I think having 8 pages of tables 
at the back of the master5 doc can be more concise and precise which will help 
us get attempt to prove correctness.  

Clarification questions:

1)  State update coordination.  What is a "state updates from the outside"  Do 
RS's initiate splitting on their own?  Maybe a picture would help so we can 
figure out if it is similar or different from hbck-master's?

2) Single point of truth.  What is this truth? what the user specficied 
actions?  what the rs's are reporting?  the last state we were confirmed to be 
at? hbck-master tries to define what single point of truth means by defining 
intended, current, and actual state data with durability properties on each 
kind. What do clients look at who modifies what? 

3) Table record: "if regions is out of date, it should be closed and reopened". 
It is not clear in master5 how regionservers find out that they are out of 
date. Moreover, how do clients talking to those RS's with stale versions know 
they are going to the correct RS especially in the face of RS failures due to 
timeout?

4) region record: transition states.  Shouldn't be defined as part of the 
region record? (This is really similar to hbck-masters current state and 
intended state. )

5) Note on user operations: the forgetting thing is scary to me -- in your move 
split example, what happens if an RS reads state that is forgotten?  

6) table state machine. how do we guarantee clients are not writing to against 
out of date region versions? (in hang situations, regions could be open on 
multple places -- the hung RS and the new RS the region was assigned to and 
successfully opened on)  

7) region state machine.  Earlier draft hand splitting and merge cases.  Are 
they elided in master5 or are not present any more. How would this get extended 
handle jeffrey's distributed log replay/fast write recovery feature?  

8) logical interactions:  sounds like master5 allows concurrent operations in 
specfiic regions and and specfiic table.  (e.g. it will allow moves and splits 
and merges on the same region).  hbck-master (though not fully documented) only 
allows certain region transitions when the table is enabled or if the table is 
disabled.  Are we sure we don't get into race conditions?  What happens if 
disable gets issued -- its possible for someone to reopens the region and for 
old clients to continue writing to it even though it is closed?

nit. 9) "in cursive" mean in italics. :) 

10) The table operations section have tables which I believe are the actions 
between FSM states in the table or region fsms.  Is this correct?  Can the 
edges be labeled to describe which steps these transitions correspond to?

Short doc:
nit: Design Constraints, code should: Have AM logic isolated from the 
persistent storage of state.  
// I think this should be "abstracted" so we can plug in different 
implementations of persistent storage of state.

> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>         Attachments: Entity management in Master - part 1.pdf, 
> hbckMasterV2-long.pdf, Region management in Master5.docx, Region management 
> in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Comment Edited] (HBASE-5487) Generic framework for Master-coordinated tasks

Reply via email to