ctrlaltluc opened a new pull request, #14543:
URL: https://github.com/apache/kafka/pull/14543

   ## Description
   
   This PR fixes a race condition between:
   - the log rename dir logic, which can be called during alter replica log 
dirs or during log dir delete
   - the log flush dir logic, which is called to force fsync when new segments 
are rolled
   
   This PR overwrites a previous fix in 
https://github.com/apache/kafka/pull/14280. That PR fixed a similar race 
condition (only between log flush and log delete) by swallowing 
`NoSuchFileException`, to avoid the log dir becoming offline. That was a 
correct fix for the race condition between log flush and log delete, but is not 
enough to fix the race condition between log flush and log rename.
   
   Since both log delete and log alter reached the race condition with log 
flush through log rename dir, this PR fixes the race condition for both, by 
synchronizing log flush and log rename dir on the same lock in `UnifiedLog`. 
More detailed:
   1. call to `localLog.flush` was moved under the synchronized block
   2. call to `Utils.flushDirIfExists` was replaced with call to 
`Utils.flushDir`, since swallowing `NoSuchFileException` is no longer required 
if race condition is addressed with 1
   3. `Utils.flushDirIfExists` is removed, since it is no longer used
   4. Unit test simulating concurrent dir rename is removed, since it can no 
longer happen after addressing with 1
   
   For details on the race condition, including code references, please see the 
description of https://issues.apache.org/jira/browse/KAFKA-15572 and comments.
   
   ## Testing
   
   This fix was tested by deploying trunk + patch of this PR to our staging 
clusters and running alter replica log dir on 1.5TB of data across 33863 
replica log dirs.
   
   ### Committer Checklist (excluded from commit message)
   - [x] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [x] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to