[ 
https://issues.apache.org/jira/browse/KAFKA-7814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

EverZhang updated KAFKA-7814:
-----------------------------
    Attachment: broker2.server.log.2019-01-11-17

> Broker shut down while cleaning up log file
> -------------------------------------------
>
>                 Key: KAFKA-7814
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7814
>             Project: Kafka
>          Issue Type: Bug
>          Components: log, offset manager
>    Affects Versions: 1.1.0, 2.1.0
>         Environment: os: aliYun, centos7
> docker image:wurstmeister/kafka:2.12-2.1.0
>            Reporter: EverZhang
>            Priority: Critical
>         Attachments: broker1.log-cleaner.log.2019-01-11-17, 
> broker1.server.log.2019-01-11-17, broker2.log-cleaner.log.2019-01-11-17, 
> broker2.server.log.2019-01-11-17, broker3.log-cleaner.log.2019-01-11-17, 
> broker3.server.log.2019-01-11-17
>
>
> Kafka cluster with 3 brokers(version:1.1.0) and is well running for over 6 
> months.
> Then we modified partitions from 3 to 48 for every topic after 2018/12/12,  
> then the brokers shutdown every 5-10 days.
> Then we upgraded the broker from 1.1.0 to 2.1.0,  but the brokers still keep 
> shutting down every 5-10 days.
> Each time, one broker shut down after the following error log,  then several 
> minutes later, the other 2 brokers shut down too, with the same error but 
> other partition log files.
> {code:bash}
> [2019-01-11 17:16:36,572] INFO [ProducerStateManager 
> partition=__transaction_state-11] Writing producer snapshot at offset 807760 
> (kafka.log.ProducerStateManager)
> [2019-01-11 17:16:36,572] INFO [Log partition=__transaction_state-11, 
> dir=/kafka/logs] Rolled new log segment at offset 807760 in 4 ms. 
> (kafka.log.Log)
> [2019-01-11 17:16:46,150] WARN Resetting first dirty offset of 
> __transaction_state-35 to log start offset 194404 since the checkpointed 
> offset 194345 is invalid. (kafka.log.LogCleanerManager$)
> [2019-01-11 17:16:46,239] ERROR Failed to clean up log for 
> __transaction_state-11 in dir /kafka/logs due to IOException 
> (kafka.server.LogDirFailureChannel)
> java.nio.file.NoSuchFileException: 
> /kafka/logs/__transaction_state-11/00000000000000807727.log
>         at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
>         at 
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
>         at java.nio.file.Files.move(Files.java:1395)
>         at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:809)
>         at 
> org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:222)
>         at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
>         at kafka.log.Log.asyncDeleteSegment(Log.scala:1838)
>         at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:1901)
>         at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:1896)
>         at scala.collection.immutable.List.foreach(List.scala:388)
>         at kafka.log.Log.replaceSegments(Log.scala:1896)
>         at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:583)
>         at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:515)
>         at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:514)
>         at scala.collection.immutable.List.foreach(List.scala:388)
>         at kafka.log.Cleaner.doClean(LogCleaner.scala:514)
>         at kafka.log.Cleaner.clean(LogCleaner.scala:492)
>         at kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:353)
>         at 
> kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:319)
>         at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
>         Suppressed: java.nio.file.NoSuchFileException: 
> /kafka/logs/__transaction_state-11/00000000000000807727.log -> 
> /kafka/logs/__transaction_state-11/00000000000000807727.log.deleted
>                 at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>                 at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>                 at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
>                 at 
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
>                 at java.nio.file.Files.move(Files.java:1395)
>                 at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:806)
>                 ... 17 more
> [2019-01-11 17:16:46,245] INFO [ReplicaManager broker=2] Stopping serving 
> replicas in dir /kafka/logs (kafka.server.ReplicaManager)
> [2019-01-11 17:16:46,314] INFO Stopping serving logs in dir /kafka/logs 
> (kafka.log.LogManager)
> [2019-01-11 17:16:46,326] ERROR Shutdown broker because all log dirs in 
> /kafka/logs have failed (kafka.log.LogManager)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to