ccding opened a new pull request #11351:
URL: https://github.com/apache/kafka/pull/11351


   We have seen an exception caused by shutting down the scheduler before 
shutting down LogManager.
   
   When LogManager was closing partitions one by one, the scheduler called to 
delete old segments due to retention. However, the old segments could have been 
closed by the LogManager, which caused an exception and subsequently marked 
logdir as offline. As a result, the broker didn't flush the remaining 
partitions and didn't write the clean shutdown marker. Ultimately the broker 
took hours to recover the log during restart.
   
   This PR essentially reverts https://github.com/apache/kafka/pull/10538
   
   I believe the exception https://github.com/apache/kafka/pull/10538 saw is at 
https://github.com/apache/kafka/blob/5a6f19b2a1ff72c52ad627230ffdf464456104ee/core/src/main/scala/kafka/log/LocalLog.scala#L895-L903
 which called the scheduler and crashed the compaction thread. The effect of 
this exception has been mitigated by https://github.com/apache/kafka/pull/10763
   
   cc @rondagostino @ijuma @cmccabe @junrao @dhruvilshah3 as authors/reviewers 
of the PRs mentioned above to make sure this change look okay.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to