[
https://issues.apache.org/jira/browse/KAFKA-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mickael Maison resolved KAFKA-7122.
-----------------------------------
Resolution: Won't Fix
We are now removing ZooKeeper support so closing this issue.
> Data is lost when ZooKeeper times out
> -------------------------------------
>
> Key: KAFKA-7122
> URL: https://issues.apache.org/jira/browse/KAFKA-7122
> Project: Kafka
> Issue Type: Bug
> Components: core, replication
> Affects Versions: 0.11.0.2
> Reporter: Nick Lipple
> Priority: Blocker
>
> Noticed that a kafka cluster will lose data when a leader for a partition has
> their zookeeper connection timeout.
> Sequence of events:
> # Say broker A leads a partition followed by brokers B and C
> # A ZK node has a network issue, happens to be the node used by broker A.
> Lets say this happens at offset X
> # Kafka Controller immediately selects broker C as the new partition leader
> # Broker A does not timeout from zookeeper for another 4 seconds. Broker A
> still thinks it is the leader, presumably accepting producer writes.
> # Broker A detects the ZK timeout and leaves the ISR.
> # Broker A reconnects to ZK, rejoins cluster as follower for partition
> # Broker A truncates log to some offset Y such that Y > X. Broker A proceeds
> to catch up normally and becomes an ISR
> # ISRs for partition are now in an inconsistent state:
> ## Broker C has all offsets X through Y plus everything after
> ## Broker B has all offsets X through Y plus everything after
> ## Broker A has offsets up to X and after Y. Everything between X and Y *IS
> MISSING*
> # Within 5 minutes, controller trigger preferred replica election making
> Broker A the new leader for partition (this is default behavior)
> All consumers after step 9 will not receive any messages for offsets between
> X and Y.
>
> The root problem here seems to be broker A truncates to offset Y when
> rejoining the cluster. It should be truncating further back to offset X to
> prevent data loss
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)