showuon opened a new pull request, #15951:
URL: https://github.com/apache/kafka/pull/15951

   When doing alter replica logDirs, we'll create a future log and pause log 
cleaning for the partition( 
[here](https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/server/ReplicaManager.scala#L1200)).
 And this log cleaning pausing will resume after alter replica logDirs 
completes 
([here](https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogManager.scala#L1254)).
 And when in the resuming log cleaning, we'll decrement 1 for the 
LogCleaningPaused count. Once the count reached 0, the cleaning pause is really 
resuming. 
([here](https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogCleanerManager.scala#L310)).
 For more explanation about the logCleaningPaused state can check 
[here](https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogCleanerManager.scala#L55).
   
    But, there's still one factor that could increase the LogCleaningPaused 
count: leadership change 
([here](https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/server/ReplicaManager.scala#L2126)).
 When there's a leadership change, we'll check if there's a future log in this 
partition, if so, we'll create future log and pauseCleaning (LogCleaningPaused 
count + 1). So, if during the alter replica logDirs:
   
       1. alter replica logDirs for tp0 triggered (LogCleaningPaused count = 1)
       2. tp0 leadership changed (LogCleaningPaused count = 2)
       3. alter replica logDirs completes, resuming logCleaning 
(LogCleaningPaused count = 1)
       4. LogCleaning keeps paused because the count is always >  0
   
   This PR fixes this issue by only abortAndPauseCleaning when future log is 
not existed. We did the same check in `alterReplicaLogDirs`. So this change can 
make sure there's only 1 `abortAndPauseCleaning` for either 
`abortAndPauseCleaning` or `maybeAddLogDirFetchers`. Tests also added.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to