[ 
https://issues.apache.org/jira/browse/KAFKA-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866955#comment-17866955
 ] 

fujian.zfj commented on KAFKA-17158:
------------------------------------

I think this is a bug triggered in an extreme scenario. The problem should be 
caused by the following:
1. Call the ALTER_REPLICA_LOG_DIRS interface to change the partition storage 
path. At this time, ReplicaAlterLogDirsThread is generated for data migration, 
and the partition cleaning will be suspended through 
logManager.abortAndPauseCleaning(topicPartition)
 !screenshot-1.png! 
2. The migration has not been completed, the leader of the partition has 
changed, and after the current node becomes a follower, it tries to create 
ReplicaAlterLogDirsThread again to complete the data migration. At the same 
time, it calls logManager.abortAndPauseCleaning(topicPartition) again to 
suspend the partition cleaning. Since the ReplicaAlterLogDirsThread has been 
created before, the previously created ReplicaAlterLogDirsThread will be reused 
here
 !screenshot-2.png! 
3. When waiting for the migration task to complete, 
logManager.resumeCleaning(topicPartition) will be called to restart the 
partition deletion. However, since 
logManager.abortAndPauseCleaning(topicPartition) was called twice before, 
LogCleaningPaused(count)=1, so The cleanupLogs method will always skip deleting 
this partition in the future
 !screenshot-3.png! 

> Method 'cleanupLogs' can not delete old logSegements after invoking 
> ALTER_REPLICA_LOG_DIRS
> ------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-17158
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17158
>             Project: Kafka
>          Issue Type: Bug
>          Components: log cleaner
>    Affects Versions: 2.6.3
>            Reporter: fujian.zfj
>            Priority: Critical
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> After invoking ALTER_REPLICA_LOG_DIRS, partition flow_pageview-9 will be 
> moved from /data1 to /data2, while ReplicaAlterLogDirsThread is created, 
> leader of partition flow_pageview-9 change from broker 58 to broker 36. After 
> that, logSegements and indexes on /data2/flow_pageview-9 are no longer being 
> deleted.
> the config of topic flow_pageview is:
> cleanup.policy=delete
> retention.ms=3600000



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to