Dong Lin created KAFKA-6604:
-------------------------------
Summary: ReplicaManager should not remove partitions on the log
dirctory from high watermark checkpoint file
Key: KAFKA-6604
URL: https://issues.apache.org/jira/browse/KAFKA-6604
Project: Kafka
Issue Type: Bug
Reporter: Dong Lin
Assignee: Dong Lin
Currently a broker may truncate a partition to log start offset in the
following scenario:
- Broker A is restarted after shutdown
- Controller knows that broker A is started.
- Som event (e.g. topic deletion) triggered controller to send
LeaderAndIsrRequest for partition P1.
- Broker A receives LeaderAndIsrRequest for partition P1. After the broker
receives the first LeaderAndIsrRequest, it will overwrite the HW checkpoint
file with all its leader partitions and follower partitions. The checkpoint
file will contain only the HW for partition P1.
- Controller sends broker A a LeaderAndIsrRequest for all its leader and
follower partitions.
- Broker creates ReplicaFetcherThread for its follower partitions, truncates
the log to HW, which will be zero for all partitions except P1.
When this happens, potentially all logs in the broker will be truncated to log
start offset and then the cluster will run with reduced availability for a long
time.
The right solution is to keep the partitions in the high watermark checkpoint
file if the partition exists in LogManager.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)