[
https://issues.apache.org/jira/browse/KAFKA-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866955#comment-17866955
]
fujian.zfj commented on KAFKA-17158:
------------------------------------
I think this is a bug triggered in an extreme scenario. The problem should be
caused by the following:
1. Call the ALTER_REPLICA_LOG_DIRS interface to change the partition storage
path. At this time, ReplicaAlterLogDirsThread is generated for data migration,
and the partition cleaning will be suspended through
logManager.abortAndPauseCleaning(topicPartition)
!screenshot-1.png!
2. The migration has not been completed, the leader of the partition has
changed, and after the current node becomes a follower, it tries to create
ReplicaAlterLogDirsThread again to complete the data migration. At the same
time, it calls logManager.abortAndPauseCleaning(topicPartition) again to
suspend the partition cleaning. Since the ReplicaAlterLogDirsThread has been
created before, the previously created ReplicaAlterLogDirsThread will be reused
here
!screenshot-2.png!
3. When waiting for the migration task to complete,
logManager.resumeCleaning(topicPartition) will be called to restart the
partition deletion. However, since
logManager.abortAndPauseCleaning(topicPartition) was called twice before,
LogCleaningPaused(count)=1, so The cleanupLogs method will always skip deleting
this partition in the future
!screenshot-3.png!
> Method 'cleanupLogs' can not delete old logSegements after invoking
> ALTER_REPLICA_LOG_DIRS
> ------------------------------------------------------------------------------------------
>
> Key: KAFKA-17158
> URL: https://issues.apache.org/jira/browse/KAFKA-17158
> Project: Kafka
> Issue Type: Bug
> Components: log cleaner
> Affects Versions: 2.6.3
> Reporter: fujian.zfj
> Priority: Critical
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> After invoking ALTER_REPLICA_LOG_DIRS, partition flow_pageview-9 will be
> moved from /data1 to /data2, while ReplicaAlterLogDirsThread is created,
> leader of partition flow_pageview-9 change from broker 58 to broker 36. After
> that, logSegements and indexes on /data2/flow_pageview-9 are no longer being
> deleted.
> the config of topic flow_pageview is:
> cleanup.policy=delete
> retention.ms=3600000
--
This message was sent by Atlassian Jira
(v8.20.10#820010)