[ 
https://issues.apache.org/jira/browse/HELIX-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221312#comment-14221312
 ] 

Zhen Zhang commented on HELIX-541:
----------------------------------

because the new ideal-state is:
{noformat}
node_2: LEADER
node_1: STANDBY
{noformat}

and the current-state is:
{noformat}
node_0: LEADER
node_1: STANDBY
{noformat}

controller figures out it has to send two transitions:
{noformat}
To node_0: LEADER->STANDBY
To node_2: OFFLINE->STANDBY
{noformat}

node_1 remains in STANDBY so controller will not send any transition to it.



> Possible "livelock" in Helix controller
> ---------------------------------------
>
>                 Key: HELIX-541
>                 URL: https://issues.apache.org/jira/browse/HELIX-541
>             Project: Apache Helix
>          Issue Type: Bug
>            Reporter: Zhen Zhang
>
> We discover a "livelock" bug in Helix controller. This leads to HELIX-540. 
> Assuming we have 3 partitions and 2 nodes, using LeaderStandby state model, 
> FULL_AUTO mode, and replica is 2. When reaching stable mapping, we might have 
> the following:
> {noformat}
> partition_0:
>   node_0: LEADER
>   node_1: STANDBY
> partition_1:
>   node_1: LEADER
>   node_0: STANDBY
> partition_2:
>   node_0: LEADER
>   node_1: STANDBY
> {noformat}
> Later we add a new node (node_2) to the cluster and rebalancer decides that 
> node_2 should become LEADER for partition_2. So controller first sends 
> OFFLINE->STANDBY transition to node_2, and the mapping becomes:
> {noformat}
> partition_0:
>   node_0: LEADER
>   node_1: STANDBY
> partition_1:
>   node_1: LEADER
>   node_0: STANDBY
> partition_2:
>   node_0: LEADER
>   node_1: STANDBY
>   node_2: STANDBY
> {noformat}
> Note that given LEADER state count is 1 and STANDBY state count is R, where 
> R=2, it is implying the following state constraints:
> {noformat}
> LEADER: upper_bound=1
> STANDBY: upper_bound=2
> {noformat}
> Helix controller now enters the "livelock": it can't send STANDBY->LEADER to 
> node_2, since this will violate LEADER upper bound; it can't send 
> LEADER->STANDBY to node_0 either, since will violate STANDBY upper bound.
> We can solve the problem in several ways:
> 1) State count definition is ambiguous. For some state, like LEADER, when we 
> say state_count=1, that means we can't violate this constraint at any time. 
> However, for some other state, like STANDBY, when we say state_count=R, that 
> means in stable mapping, there should be R-1 STANDBY replicas, but we don't 
> care the count in any transient state. In this case, we can set STANDBY 
> upper_bound to be larger than 2. Note that this doesn't solve the problem in 
> general. We may have some state model that has a restrict requirement on 
> STANDBY state.
> 2) Using state transition priority. If we define LEADER->STANDBY transition 
> should have a higher priority than OFFLINE->STANDBY transition, it will solve 
> the "livelock". But this doesn't solve the problem in general either, because 
> when the state model gets complicated, it's hard to define and prove proper 
> transition priorities that avoid "livelock" in any situation.
> 3) The root cause of the problem is that Helix controller uses a greedy 
> algorithm that only looks one step ahead. In the example, if Helix controller 
> can look two steps further, then it will find out that sending 
> OFFLINE->STANDBY transition to node_2 will lead to a dead end, therefore it 
> should choose to send LEADER->STANDBY to node_0 instead. In general we might 
> need to do a DFS/BFS and it's hard if state model is complicated and system 
> is large.
> In practice, most systems use simple state models with less than 5 states and 
> have strict state constraint on a single state (e.g MASTER, LEADER) only.  We 
> can avoid "livelock" by carefully choosing state constraints.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to