[ 
https://issues.apache.org/jira/browse/KUDU-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712812#comment-15712812
 ] 

Todd Lipcon commented on KUDU-1778:
-----------------------------------

Actually noticed this in the follower logs:

{code}
I1201 03:05:52.344377 168075 consensus_queue.cc:167] T 
07b3624f00864ab18f984364ed6e2d11 P a1a2d4b5585a4ac2a4d6e4d9a02fce6b 
[NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated index: 0, 
Majority replicated index: 0, Committed index: 0, Last appended: 111.11350, 
Current term: 0, Majority size: -1, State: 1, Mode: NON_LEADER
{code}

so I guess that if a replica starts up and has a divergent log, it will return 
committed_index=0 unless it gets elected leader first?

> Consensus "stuck" after a leader election when both peers were divergent
> ------------------------------------------------------------------------
>
>                 Key: KUDU-1778
>                 URL: https://issues.apache.org/jira/browse/KUDU-1778
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: 1.1.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>
> On a stress cluster we saw the following sequence of events following a 
> service restart while under load:
> - a peer is elected leader successfully
> - both of its followers have divergent logs
> - when it connects to a new peer with a divergent log, it decides to fall 
> back to index 0 rather than falling back to the proper committed index of 
> that peer
> - upon falling back to index 0, will never succeed since the first segment of 
> the log was already GCed long ago.
> Thus, the leader thinks that it needs to evict both of the followers and 
> can't replicate to them, and the tablet gets "stuck".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to