[ https://issues.apache.org/jira/browse/KAFKA-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376281#comment-17376281 ]
Kiran commented on KAFKA-9877: ------------------------------ I am seeing the same issue in kafka1.0 as well.. org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for testtopic-0 in dir /kafka-logs Caused by: java.io.IOException: Delete of log 00000000000000000000.log.deleted failed. at kafka.log.LogSegment.delete(LogSegment.scala:496) at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596) at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596) at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596) at kafka.log.Log.maybeHandleIOException(Log.scala:1669) at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595) at kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599) at kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110) at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) I have compaction enabled for topic with below config: segment.ms=100ms delete.retention.ms=100ms also, lot of below errros. ERROR Error while processing data for partition testtopic1-18 (kafka.server.ReplicaFetcherThread) org.apache.kafka.common.errors.KafkaStorageException: Replica 3 is in an offline log directory for partition testtopic-10 > ERROR Shutdown broker because all log dirs in /tmp/kafka-logs have failed > (kafka.log.LogManager) > ------------------------------------------------------------------------------------------------ > > Key: KAFKA-9877 > URL: https://issues.apache.org/jira/browse/KAFKA-9877 > Project: Kafka > Issue Type: Bug > Components: log cleaner > Affects Versions: 2.1.1 > Environment: Redhat > Reporter: Hawking Du > Priority: Major > Attachments: server-125.log > > > There is a so confused problem around me long time. > Kafka server often stop exceptionally seems caused by log clean process. Here > are some of logs from server. Can anyone give me some ideas for fixing it. > {code:java} > [2020-04-04 02:07:57,410] INFO [GroupMetadataManager brokerId=5] Removed 0 > expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)[2020-04-04 02:07:57,410] INFO > [GroupMetadataManager brokerId=5] Removed 0 expired offsets in 0 > milliseconds. (kafka.coordinator.group.GroupMetadataManager)[2020-04-04 > 02:17:57,410] INFO [GroupMetadataManager brokerId=5] Removed 0 expired > offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)[2020-04-04 02:27:57,410] INFO > [GroupMetadataManager brokerId=5] Removed 0 expired offsets in 0 > milliseconds. (kafka.coordinator.group.GroupMetadataManager)[2020-04-04 > 02:30:22,272] INFO [ProducerStateManager partition=__consumer_offsets-35] > Writing producer snapshot at offset 741037 > (kafka.log.ProducerStateManager)[2020-04-04 02:30:22,274] INFO [Log > partition=__consumer_offsets-35, dir=/tmp/kafka-logs] Rolled new log segment > at offset 741037 in 3 ms. (kafka.log.Log)[2020-04-04 02:30:26,289] ERROR > Failed to clean up log for __consumer_offsets-35 in dir /tmp/kafka-logs due > to IOException > (kafka.server.LogDirFailureChannel)java.nio.file.NoSuchFileException: > /tmp/kafka-logs/__consumer_offsets-35/00000000000000000000.log at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at > sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409) at > sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at > java.nio.file.Files.move(Files.java:1395) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:815) at > org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224) at > kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:508) at > kafka.log.Log.asyncDeleteSegment(Log.scala:1962) at > kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:2025) at > kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:2020) at > scala.collection.immutable.List.foreach(List.scala:392) at > kafka.log.Log.replaceSegments(Log.scala:2020) at > kafka.log.Cleaner.cleanSegments(LogCleaner.scala:602) at > kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:528) at > kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:527) at > scala.collection.immutable.List.foreach(List.scala:392) at > kafka.log.Cleaner.doClean(LogCleaner.scala:527) at > kafka.log.Cleaner.clean(LogCleaner.scala:501) at > kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:359) at > kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:328) at > kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:307) at > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:89) Suppressed: > java.nio.file.NoSuchFileException: > /tmp/kafka-logs/__consumer_offsets-35/00000000000000000000.log -> > /tmp/kafka-logs/__consumer_offsets-35/00000000000000000000.log.deleted at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at > sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396) at > sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at > java.nio.file.Files.move(Files.java:1395) at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:812) > ... 17 more[2020-04-04 02:30:26,296] INFO [ReplicaManager broker=5] Stopping > serving replicas in dir /tmp/kafka-logs > (kafka.server.ReplicaManager)[2020-04-04 02:30:26,302] INFO > [ReplicaFetcherManager on broker 5] Removed fetcher for partitions > Set(fitment-deduct-0, __consumer_offsets-22, __consumer_offsets-30, > __consumer_offsets-4, __consumer_offsets-27, __consumer_offsets-7, > __consumer_offsets-9, __consumer_offsets-46, __consumer_offsets-35, > __consumer_offsets-23, __consumer_offsets-49, __consumer_offsets-47, test-0, > __consumer_offsets-31, __consumer_offsets-42, __consumer_offsets-3, > __consumer_offsets-18, __consumer_offsets-15, __consumer_offsets-24, > ajhz-log-0, __consumer_offsets-38, __consumer_offsets-19, > __consumer_offsets-11, bpinfo-sync-0, spinfo-sync-backup-0, > __consumer_offsets-2, __consumer_offsets-43, __consumer_offsets-6, > __consumer_offsets-14, __consumer_offsets-44, __consumer_offsets-39, > __consumer_offsets-26, __consumer_offsets-29, __consumer_offsets-34, > __consumer_offsets-10, video-log-0) > (kafka.server.ReplicaFetcherManager)[2020-04-04 02:30:26,303] INFO > [ReplicaAlterLogDirsManager on broker 5] Removed fetcher for partitions > Set(fitment-deduct-0, __consumer_offsets-22, __consumer_offsets-30, > __consumer_offsets-4, __consumer_offsets-27, __consumer_offsets-7, > __consumer_offsets-9, __consumer_offsets-46, __consumer_offsets-35, > __consumer_offsets-23, __consumer_offsets-49, __consumer_offsets-47, test-0, > __consumer_offsets-31, __consumer_offsets-42, __consumer_offsets-3, > __consumer_offsets-18, __consumer_offsets-15, __consumer_offsets-24, > ajhz-log-0, __consumer_offsets-38, __consumer_offsets-19, > __consumer_offsets-11, bpinfo-sync-0, spinfo-sync-backup-0, > __consumer_offsets-2, __consumer_offsets-43, __consumer_offsets-6, > __consumer_offsets-14, __consumer_offsets-44, __consumer_offsets-39, > __consumer_offsets-26, __consumer_offsets-29, __consumer_offsets-34, > __consumer_offsets-10, video-log-0) > (kafka.server.ReplicaAlterLogDirsManager)[2020-04-04 02:30:26,330] INFO > [ReplicaManager broker=5] Broker 5 stopped fetcher for partitions > fitment-deduct-0,__consumer_offsets-22,__consumer_offsets-30,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,__consumer_offsets-35,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,test-0,__consumer_offsets-31,__consumer_offsets-42,__consumer_offsets-3,__consumer_offsets-18,__consumer_offsets-15,__consumer_offsets-24,ajhz-log-0,__consumer_offsets-38,__consumer_offsets-19,__consumer_offsets-11,bpinfo-sync-0,spinfo-sync-backup-0,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,__consumer_offsets-44,__consumer_offsets-39,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,__consumer_offsets-10,video-log-0 > and stopped moving logs for partitions because they are in the failed log > directory /tmp/kafka-logs. (kafka.server.ReplicaManager)[2020-04-04 > 02:30:26,330] INFO Stopping serving logs in dir /tmp/kafka-logs > (kafka.log.LogManager)[2020-04-04 02:30:26,347] ERROR Shutdown broker because > all log dirs in /tmp/kafka-logs have failed (kafka.log.LogManager){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)