[ https://issues.apache.org/jira/browse/KAFKA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057699#comment-17057699 ]
ASF GitHub Bot commented on KAFKA-9458: --------------------------------------- hirik commented on pull request #8276: KAFKA-9458: Kafka broker crashes in windows platform URL: https://github.com/apache/kafka/pull/8276 Reason: I found that in LogSegment, files are being renamed as per the status flag(deleted, cleaned, swap). In windows renaming or deleting an opened file is not allowed. Existing behavior (LogSegment Management): While deleting a segment or creating a clean segment, we are trying to change the status of the segment. Segment status is maintained by the file suffix. Possible solutions: 1. Close the existing files before renaming or deleting 2. Maintain a separate status file to store the status of the segment. I've implemented the second solution and it's successfully running on my windows environment. Please review the changes and let me know your comments. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka crashed in windows environment > ------------------------------------ > > Key: KAFKA-9458 > URL: https://issues.apache.org/jira/browse/KAFKA-9458 > Project: Kafka > Issue Type: Bug > Components: log > Affects Versions: 2.4.0 > Environment: Windows Server 2019 > Reporter: hirik > Priority: Critical > Labels: windows > Fix For: 2.6.0 > > Attachments: Windows_crash_fix.patch, logs.zip > > > Hi, > while I was trying to validate Kafka retention policy, Kafka Server crashed > with below exception trace. > [2020-01-21 17:10:40,475] INFO [Log partition=test1-3, > dir=C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka] > Rolled new log segment at offset 1 in 52 ms. (kafka.log.Log) > [2020-01-21 17:10:40,484] ERROR Error while deleting segments for test1-3 in > dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka > (kafka.server.LogDirFailureChannel) > java.nio.file.FileSystemException: > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex > -> > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex.deleted: > The process cannot access the file because it is being used by another > process. > at > java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) > at > java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) > at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395) > at > java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292) > at java.base/java.nio.file.Files.move(Files.java:1425) > at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:795) > at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209) > at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497) > at kafka.log.Log.$anonfun$deleteSegmentFiles$1(Log.scala:2206) > at kafka.log.Log.$anonfun$deleteSegmentFiles$1$adapted(Log.scala:2206) > at scala.collection.immutable.List.foreach(List.scala:305) > at kafka.log.Log.deleteSegmentFiles(Log.scala:2206) > at kafka.log.Log.removeAndDeleteSegments(Log.scala:2191) > at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1700) > at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17) > at kafka.log.Log.maybeHandleIOException(Log.scala:2316) > at kafka.log.Log.deleteSegments(Log.scala:1691) > at kafka.log.Log.deleteOldSegments(Log.scala:1686) > at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1763) > at kafka.log.Log.deleteOldSegments(Log.scala:1753) > at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:982) > at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:979) > at scala.collection.immutable.List.foreach(List.scala:305) > at kafka.log.LogManager.cleanupLogs(LogManager.scala:979) > at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:403) > at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:830) > Suppressed: java.nio.file.FileSystemException: > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex > -> > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex.deleted: > The process cannot access the file because it is being used by another > process. > at > java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) > at > java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) > at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:309) > at > java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292) > at java.base/java.nio.file.Files.move(Files.java:1425) > at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:792) > ... 27 more > [2020-01-21 17:10:40,495] INFO [ReplicaManager broker=0] Stopping serving > replicas in dir > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka > (kafka.server.ReplicaManager) > [2020-01-21 17:10:40,495] ERROR Uncaught exception in scheduled task > 'kafka-log-retention' (kafka.utils.KafkaScheduler) > org.apache.kafka.common.errors.KafkaStorageException: Error while deleting > segments for test1-3 in dir > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka > Caused by: java.nio.file.FileSystemException: > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex > -> > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex.deleted: > The process cannot access the file because it is being used by another > process. > at > java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) > at > java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) > at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395) > at > java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292) > at java.base/java.nio.file.Files.move(Files.java:1425) > at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:795) > at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209) > at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497) > at kafka.log.Log.$anonfun$deleteSegmentFiles$1(Log.scala:2206) > at kafka.log.Log.$anonfun$deleteSegmentFiles$1$adapted(Log.scala:2206) > at scala.collection.immutable.List.foreach(List.scala:305) > at kafka.log.Log.deleteSegmentFiles(Log.scala:2206) > at kafka.log.Log.removeAndDeleteSegments(Log.scala:2191) > at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1700) > at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17) > at kafka.log.Log.maybeHandleIOException(Log.scala:2316) > at kafka.log.Log.deleteSegments(Log.scala:1691) > at kafka.log.Log.deleteOldSegments(Log.scala:1686) > at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1763) > at kafka.log.Log.deleteOldSegments(Log.scala:1753) > at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:982) > at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:979) > at scala.collection.immutable.List.foreach(List.scala:305) > at kafka.log.LogManager.cleanupLogs(LogManager.scala:979) > at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:403) > at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:830) > Suppressed: java.nio.file.FileSystemException: > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex > -> > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex.deleted: > The process cannot access the file because it is being used by another > process. > at > java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) > at > java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) > at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:309) > at > java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292) > at java.base/java.nio.file.Files.move(Files.java:1425) > at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:792) > ... 27 more > [2020-01-21 17:10:40,505] INFO [ReplicaFetcherManager on broker 0] Removed > fetcher for partitions HashSet(test1-3, test1-7, test-0, test1-0, test1-1, > test1-5, test1-2, test1-8, test1-4, test1-9, test1-6) > (kafka.server.ReplicaFetcherManager) > [2020-01-21 17:10:40,507] INFO [ReplicaAlterLogDirsManager on broker 0] > Removed fetcher for partitions HashSet(test1-3, test1-7, test-0, test1-0, > test1-1, test1-5, test1-2, test1-8, test1-4, test1-9, test1-6) > (kafka.server.ReplicaAlterLogDirsManager) > [2020-01-21 17:10:40,522] INFO [ReplicaManager broker=0] Broker 0 stopped > fetcher for partitions > test1-3,test1-7,test-0,test1-0,test1-1,test1-5,test1-2,test1-8,test1-4,test1-9,test1-6 > and stopped moving logs for partitions because they are in the failed log > directory > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka. > (kafka.server.ReplicaManager) > [2020-01-21 17:10:40,523] INFO Stopping serving logs in dir > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka > (kafka.log.LogManager) > [2020-01-21 17:10:40,526] ERROR Shutdown broker because all log dirs in > C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka have > failed (kafka.log.LogManager) -- This message was sent by Atlassian Jira (v8.3.4#803005)