[ https://issues.apache.org/jira/browse/KAFKA-6604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382709#comment-16382709 ]
ASF GitHub Bot commented on KAFKA-6604: --------------------------------------- lindong28 opened a new pull request #4634: KAFKA-6604; ReplicaManager should not remove partitions on the log directory from high watermark checkpoint file URL: https://github.com/apache/kafka/pull/4634 *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ReplicaManager should not remove partitions on the log dirctory from high > watermark checkpoint file > --------------------------------------------------------------------------------------------------- > > Key: KAFKA-6604 > URL: https://issues.apache.org/jira/browse/KAFKA-6604 > Project: Kafka > Issue Type: Bug > Reporter: Dong Lin > Assignee: Dong Lin > Priority: Major > > Currently a broker may truncate a partition to log start offset in the > following scenario: > - Broker A is restarted after shutdown > - Controller knows that broker A is started. > - Som event (e.g. topic deletion) triggered controller to send > LeaderAndIsrRequest for partition P1. > - Broker A receives LeaderAndIsrRequest for partition P1. After the broker > receives the first LeaderAndIsrRequest, it will overwrite the HW checkpoint > file with all its leader partitions and follower partitions. The checkpoint > file will contain only the HW for partition P1. > - Controller sends broker A a LeaderAndIsrRequest for all its leader and > follower partitions. > - Broker creates ReplicaFetcherThread for its follower partitions, truncates > the log to HW, which will be zero for all partitions except P1. > When this happens, potentially all logs in the broker will be truncated to > log start offset and then the cluster will run with reduced availability for > a long time. > The right solution is to keep the partitions in the high watermark checkpoint > file if the partition exists in LogManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005)