[
https://issues.apache.org/jira/browse/KUDU-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Percy resolved KUDU-639.
-----------------------------
Resolution: Fixed
> Leader doesn't overwrite demoted follower's log properly
> --------------------------------------------------------
>
> Key: KUDU-639
> URL: https://issues.apache.org/jira/browse/KUDU-639
> Project: Kudu
> Issue Type: Bug
> Components: consensus
> Affects Versions: M4.5
> Reporter: David Alves
> Assignee: Todd Lipcon
> Priority: Minor
> Fix For: M5
>
>
> We just ran into this situation in the YCSB cluster, which is apparently a
> log divergence.
> We have nodes a, b, c (corresponding to nodes
> 33c8fb1dc4434df0938ccc27ecfd58a1/a1219,
> 4ed2e09f80e04d198edeb53e15b3539e/a1220,
> ab8ed89f9041495a95b8d2b77591c9d7/a1215).
> Node a is leader for term 3, timesout
> Node b is elected leader for term 5 with votes from b, c
> When b is elected leader the log state is:
> State: All replicated op: 3.6546, Majority replicated op: 3.6533, Committed
> index: 3.6533, Last appended: 3.6546, Current term: 5
> b never actually replicates anything and eventually loses leadership to node
> a, again.
> When b loses leadership it's wall is at the following state:
> State: All replicated op: 0.0, Majority replicated op: 3.6533, Committed
> index: 3.6533, Last appended: 5.6547, Current term: 5
> That is b appended a message in term 5 but never actually got to commit it.
> However, if we look at b's log we find a message in term 5 committed:
> 3.6546@99404 REPLICATE WRITE_OP
> COMMIT 3.6533
> 5.6547@99789 REPLICATE CHANGE_CONFIG_OP
> COMMIT 3.6535
> COMMIT 3.6536
> COMMIT 3.6537
> COMMIT 3.6538
> COMMIT 3.6534
> COMMIT 3.6541
> COMMIT 3.6540
> COMMIT 3.6543
> COMMIT 3.6542
> COMMIT 3.6545
> COMMIT 3.6546
> COMMIT 3.6544
> COMMIT 3.6539
> COMMIT 5.6547
> 3.6548@99430 REPLICATE WRITE_OP
> 6.6549@99795 REPLICATE CHANGE_CONFIG_OP
> And more problematically, that diverges from the other two nodes's logs:
> 3.6546@99404 REPLICATE WRITE_OP
> COMMIT 3.6533
> COMMIT 3.6536
> COMMIT 3.6537
> COMMIT 3.6535
> COMMIT 3.6539
> COMMIT 3.6538
> COMMIT 3.6534
> COMMIT 3.6541
> COMMIT 3.6540
> COMMIT 3.6543
> COMMIT 3.6542
> COMMIT 3.6544
> 3.6547@99429 REPLICATE WRITE_OP
> 3.6548@99430 REPLICATE WRITE_OP
> 6.6549@99795 REPLICATE CHANGE_CONFIG_OP
> 6.6550@99878 REPLICATE WRITE_OP
> 6.6551@99879 REPLICATE WRITE_OP
> 6.6552@99880 REPLICATE WRITE_OP
> COMMIT 3.6545
> COMMIT 3.6548
> COMMIT 3.6547
> COMMIT 3.6546
> COMMIT 6.6549
--
This message was sent by Atlassian Jira
(v8.3.4#803005)