Hi Flavio,

Here is my attempt(disabled rich text):

Lets assume a large cluster and look at three nodes.

ep = electionEpoch, p = peerEpoch, z = zxid
step 1:
A [ ep:3, p:1, z: 1 ] [ LOOKING ] { failed to follow multiple times, hence 
resets Vote to stored values and bumps up epoch }
B [ ep:1, p:2, z: 9 ] [ FOLLOWING ]
C [ ep:1, p:2, z: 9 ] [ LEADING ]

step 2:

B goes LOOKING state but cannot reach A or C

A [ ep:3, p:1, z: 1 ] [ LOOKING ] 
B [ ep:2, p:2, z: 99 ] [ LOOKING ] { starts vote with last committed 
transactions }
C [ ep:1, p:2, z: 99 ] [ LEADING ]

step 3:

C goes into LOOKING

A [ ep:3, p:1, z: 1 ] [ LOOKING ] 
B [ ep:2, p:2, z: 99 ] [ LOOKING ]
C [ ep:2, p:2, z: 999 ] [ LOOKING ]

step 4:

Only B and C can reach each other so they converge.


A [ ep:3, p:1, z: 1 ] [ LOOKING ] 
B [ ep:2, p:2, z: 999 ] [ LOOKING ]
C [ ep:2, p:2, z: 999 ] [ LOOKING ]

step 5:

B hears from A now and A still cannot see C

bit.ly/1kxjk5G: A.ep > B.logicalClock and totalOrderPredicate(A, B) is false 
hence B resets to values stored on disk i.e lost values it learned from C but 
copies the logicalClock


A [ ep:3, p:1, z: 1 ] [ LOOKING ] 
B [ ep:3, p:2, z: 99 ] [ LOOKING ] { moved back zxid }
C [ ep:2, p:2, z: 999 ] [ LOOKING ]

Now A and B converge

step 6:
A [ ep:3, p:2, z: 99 ] [ LOOKING ] 
B [ ep:3, p:2, z: 99 ] [ LOOKING ]
C [ ep:2, p:2, z: 999 ] [ LOOKING ]

My initial question is - what is goal of the resetting the proposal to on disk 
values when Rx proposal electionEpoch is greater than current logical clock and 
totalOrderPredicate(Rx, this) is false in step 5 above. This caused B to 
unlearn and re-learn.

Also if you can shed some light regarding the role of ElectionEpoch that would 
be great. Is this due to the fact a Vote received by LeaderElection could be 
stale and forcing system to converge on an ElectionEpoch helps with liveliness 
? i.e electing a leader with reasonably latest votes?. But why doesn't it 
consider Vote from LEADER/FOLLOWER for learning. Why is it necessary to learn 
only from LOOKING peers and not from LEADER/FOLLOWER.

Here is another case to illustrate this problem:

A[K], B[K], C[F], D[L], E[F] { K = looking, F = following, L = leading }

In a partitioned system here where A and B can see C and D but not E, C and D 
can see all. 
In this case A and B will never go to following state and follow D since both 
of them will never learn from out of election peers (exception is when an out 
of election peer has the same election epoch as current logical clock). 

Here the system is working without the participation of A and B. 

Any help is appreciated.

thanks
Powell.

On Tuesday, January 5, 2016 8:55 AM, Flavio Junqueira <[email protected]> wrote:
Hi Powell,

I don't understand why you want to reset the values of the server vote when the 
totalOrderPredicate check fails. The values you're referring to are epoch and 
zxid?

In the example you give, it looks like you're saying that the vote of B wins 
over the vote of C and the one of C wins over the one of A, so the order is B > 
C > A, but A shouldn't take C's vote because it already took B's and B's vote 
win. If that's the case, then this already happens. I'm probably missing the 
point here, so perhaps you could provide an example with more detail, like with 
epoch numbers and such to illustrate the point.

-Flavio



> On 01 Jan 2016, at 04:19, Powell Molleti <[email protected]> wrote:
> 
> Hi,
> I  want to better understand the use of code here: http://bit.ly/1kxjk5GWhy 
> should FLE reset the Vote to what is on the disk/initial values when 
> totalOrderPredicate() fails in the case of received ElectionEpoch being 
> greater than current vote's ElectionEpoch. 
> Going back to initial values(and clearing the recv set) does not seem to make 
> it incorrect but seems to slow down FLE if I am not mistaken. For example if 
> B has the best totalOrderPredicate() and A learns from it and if C has higher 
> election epoch but older values then A is forced to reset what it learned 
> from B till C and B catch up to each other?. Rather than let A and B wait for 
> C to upgrade its values after A and B borrow its ElectionEpoch?. 
> Any help is appreciated.
> thanksPowell.

Reply via email to