[ 
https://issues.apache.org/jira/browse/KAFKA-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-7164:
-----------------------------------
    Description: 
Currently we skip log truncation for followers if a LeaderAndIsr request is 
received, but the leader does not change. This can lead to log divergence if 
the follower missed a leader change before the current known leader was 
reelected. Basically the problem is that the leader may truncate its own log 
prior to becoming leader again, so the follower would need to reconcile its log 
again.

For example, suppose that we have three replicas: r1, r2, and r3. Initially, r1 
is the leader in epoch 0 and writes one record at offset 0. r3 replicates this 
successfully.

{code}
r1: 
  status: leader
  epoch: 0
  log: [{id: 0, offset: 0, epoch:0}]
r2: 
  status: follower
  epoch: 0
  log: []
r3: 
  status: follower
  epoch: 0
  log: [{id: 0, offset: 0, epoch:0}]
{code}

Suppose then that r2 becomes leader in epoch 1. r1 notices the leader change 
and truncates, but r3 for whatever reason, does not.

{code}
r1: 
  status: follower
  epoch: 1
  log: []
r2: 
  status: leader
  epoch: 1
  log: []
r3: 
  status: follower
  epoch: 0
  log: [{offset: 0, epoch:0}]
{code}

Now suppose that r2 fails and r1 becomes the leader in epoch 2. Immediately it 
writes a new record:

{code}
r1: 
  status: leader
  epoch: 2
  log: [{id: 1, offset: 0, epoch:2}]
r2: 
  status: follower
  epoch: 2
  log: []
r3: 
  status: follower
  epoch: 0
  log: [{id: 0, offset: 0, epoch:0}]
{code}

If the replica continues fetching with the old epoch, we can have log 
divergence as noted in KAFKA-6880. However, if r3 successfully receives the new 
LeaderAndIsr request which updates the epoch to 2, but skips the truncation, 
then the logs will stay inconsistent.


  was:
Currently we skip log truncation for followers if a LeaderAndIsr request is 
received, but the leader does not change. This can lead to log divergence if 
the follower missed a leader change before the current known leader was 
reelected. Basically the problem is that the leader may truncate its own log 
prior to becoming leader again, so the follower would need to reconcile its log 
again.

For example, suppose that we have three replicas: r1, r2, and r3. Initially, r1 
is the leader in epoch 0 and writes one record at offset 0. r3 replicates this 
successfully.

r1: 
  status: leader
  epoch: 0
  log: [{id: 0, offset: 0, epoch:0}]
r2: 
  status: follower
  epoch: 0
  log: []
r3: 
  status: follower
  epoch: 0
  log: [{id: 0, offset: 0, epoch:0}]

Suppose then that r2 becomes leader in epoch 1. r1 notices the leader change 
and truncates, but r3 for whatever reason, does not.

r1: 
  status: follower
  epoch: 1
  log: []
r2: 
  status: leader
  epoch: 1
  log: []
r3: 
  status: follower
  epoch: 0
  log: [{offset: 0, epoch:0}]

Now suppose that r2 fails and r1 becomes the leader in epoch 2. Immediately it 
writes a new record:

r1: 
  status: leader
  epoch: 2
  log: [{id: 1, offset: 0, epoch:2}]
r2: 
  status: follower
  epoch: 2
  log: []
r3: 
  status: follower
  epoch: 0
  log: [{id: 0, offset: 0, epoch:0}]

If the replica continues fetching with the old epoch, we can have log 
divergence as noted in KAFKA-6880. However, if r3 successfully receives the new 
LeaderAndIsr request which updates the epoch to 2, but skips the truncation, 
then the logs will stay inconsistent.



> Follower should truncate after every leader epoch change
> --------------------------------------------------------
>
>                 Key: KAFKA-7164
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7164
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Major
>
> Currently we skip log truncation for followers if a LeaderAndIsr request is 
> received, but the leader does not change. This can lead to log divergence if 
> the follower missed a leader change before the current known leader was 
> reelected. Basically the problem is that the leader may truncate its own log 
> prior to becoming leader again, so the follower would need to reconcile its 
> log again.
> For example, suppose that we have three replicas: r1, r2, and r3. Initially, 
> r1 is the leader in epoch 0 and writes one record at offset 0. r3 replicates 
> this successfully.
> {code}
> r1: 
>   status: leader
>   epoch: 0
>   log: [{id: 0, offset: 0, epoch:0}]
> r2: 
>   status: follower
>   epoch: 0
>   log: []
> r3: 
>   status: follower
>   epoch: 0
>   log: [{id: 0, offset: 0, epoch:0}]
> {code}
> Suppose then that r2 becomes leader in epoch 1. r1 notices the leader change 
> and truncates, but r3 for whatever reason, does not.
> {code}
> r1: 
>   status: follower
>   epoch: 1
>   log: []
> r2: 
>   status: leader
>   epoch: 1
>   log: []
> r3: 
>   status: follower
>   epoch: 0
>   log: [{offset: 0, epoch:0}]
> {code}
> Now suppose that r2 fails and r1 becomes the leader in epoch 2. Immediately 
> it writes a new record:
> {code}
> r1: 
>   status: leader
>   epoch: 2
>   log: [{id: 1, offset: 0, epoch:2}]
> r2: 
>   status: follower
>   epoch: 2
>   log: []
> r3: 
>   status: follower
>   epoch: 0
>   log: [{id: 0, offset: 0, epoch:0}]
> {code}
> If the replica continues fetching with the old epoch, we can have log 
> divergence as noted in KAFKA-6880. However, if r3 successfully receives the 
> new LeaderAndIsr request which updates the epoch to 2, but skips the 
> truncation, then the logs will stay inconsistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to