Dong Lin created KAFKA-6604:
-------------------------------

             Summary: ReplicaManager should not remove partitions on the log 
dirctory from high watermark checkpoint file
                 Key: KAFKA-6604
                 URL: https://issues.apache.org/jira/browse/KAFKA-6604
             Project: Kafka
          Issue Type: Bug
            Reporter: Dong Lin
            Assignee: Dong Lin


Currently a broker may truncate a partition to log start offset in the 
following scenario:

- Broker A is restarted after shutdown
- Controller knows that broker A is started.
- Som event (e.g. topic deletion) triggered controller to send 
LeaderAndIsrRequest for partition P1.
- Broker A receives LeaderAndIsrRequest for partition P1. After the broker 
receives the first LeaderAndIsrRequest, it will overwrite the HW checkpoint 
file with all its leader partitions and follower partitions. The checkpoint 
file will contain only the HW for partition P1.
- Controller sends broker A a LeaderAndIsrRequest for all its leader and 
follower partitions.
- Broker creates ReplicaFetcherThread for its follower partitions, truncates 
the log to HW, which will be zero for all partitions except P1.

When this happens, potentially all logs in the broker will be truncated to log 
start offset and then the cluster will run with reduced availability for a long 
time.

The right solution is to keep the partitions in the high watermark checkpoint 
file if the partition exists in LogManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to