[
https://issues.apache.org/jira/browse/KAFKA-10706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Gustafson resolved KAFKA-10706.
-------------------------------------
Fix Version/s: 2.7.1
2.6.1
2.5.2
2.4.2
Resolution: Fixed
> Liveness bug in truncation protocol can lead to indefinite URP
> --------------------------------------------------------------
>
> Key: KAFKA-10706
> URL: https://issues.apache.org/jira/browse/KAFKA-10706
> Project: Kafka
> Issue Type: Bug
> Reporter: Jason Gustafson
> Assignee: Jason Gustafson
> Priority: Major
> Fix For: 2.4.2, 2.5.2, 2.6.1, 2.7.1
>
>
> We hit an interesting liveness condition in the truncation protocol. Broker A
> was leader in epoch 7, broker B was leader in epoch 8, and then broker A was
> leader in epoch 9 again.
> On broker A, we had the following state in the epoch cache:
> {code}
> epoch 4, start offset 3953
> epoch 7, start offset 3983
> epoch 9, start offset 3988
> {code}
> On broker B, we had the following:
> {code}
> epoch 4, start offset 3953
> epoch 8, start offset 3983
> {code}
> After A was elected, broker B sent epoch 8 in OffsetsForLeaderEpoch. Broker A
> correctly responded with epoch 7 ending at offset 3988. The end offset on
> broker B was in fact 3983, so this truncation had no effect. Broker B then
> retried with epoch 8 again and replication was stuck.
> When a replica becomes leader, it first inserts an entry into the epoch cache
> with the current log end offset. This ensures that that it has a larger epoch
> in the cache than any epoch that could be requested by a valid replica.
> However, I think it is incorrect to turn around and use this epoch when
> becoming a follower. It seems like we need symmetric logic after becoming a
> follower to remove this epoch entry.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)