EverZhang created KAFKA-7814:
--------------------------------
Summary: Broker shut down while cleaning up log file
Key: KAFKA-7814
URL: https://issues.apache.org/jira/browse/KAFKA-7814
Project: Kafka
Issue Type: Bug
Components: log, offset manager
Affects Versions: 2.1.0, 1.1.0
Environment: os: aliYun, centos7
docker image:wurstmeister/kafka:2.12-2.1.0
Reporter: EverZhang
Kafka cluster with 3 brokers(version:1.1.0) and is well running for over 6
months.
Then we modified partitions from 3 to 48 for every topic after 2018/12/12,
then the brokers shutdown every 5-10 days.
Then we upgraded the broker from 1.1.0 to 2.1.0, but the brokers still keep
shutting down every 5-10 days.
Each time, one broker shut down after the following error log, then several
minutes later, the other 2 brokers shut down too, with the same error but other
partition log files.
{code:bash}
[2019-01-11 17:16:36,572] INFO [ProducerStateManager
partition=__transaction_state-11] Writing producer snapshot at offset 807760
(kafka.log.ProducerStateManager)
[2019-01-11 17:16:36,572] INFO [Log partition=__transaction_state-11,
dir=/kafka/logs] Rolled new log segment at offset 807760 in 4 ms.
(kafka.log.Log)
[2019-01-11 17:16:46,150] WARN Resetting first dirty offset of
__transaction_state-35 to log start offset 194404 since the checkpointed offset
194345 is invalid. (kafka.log.LogCleanerManager$)
[2019-01-11 17:16:46,239] ERROR Failed to clean up log for
__transaction_state-11 in dir /kafka/logs due to IOException
(kafka.server.LogDirFailureChannel)
java.nio.file.NoSuchFileException:
/kafka/logs/__transaction_state-11/00000000000000807727.log
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
at
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:809)
at
org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:222)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
at kafka.log.Log.asyncDeleteSegment(Log.scala:1838)
at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:1901)
at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:1896)
at scala.collection.immutable.List.foreach(List.scala:388)
at kafka.log.Log.replaceSegments(Log.scala:1896)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:583)
at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:515)
at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:514)
at scala.collection.immutable.List.foreach(List.scala:388)
at kafka.log.Cleaner.doClean(LogCleaner.scala:514)
at kafka.log.Cleaner.clean(LogCleaner.scala:492)
at kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:353)
at
kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:319)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
Suppressed: java.nio.file.NoSuchFileException:
/kafka/logs/__transaction_state-11/00000000000000807727.log ->
/kafka/logs/__transaction_state-11/00000000000000807727.log.deleted
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
at
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:806)
... 17 more
[2019-01-11 17:16:46,245] INFO [ReplicaManager broker=2] Stopping serving
replicas in dir /kafka/logs (kafka.server.ReplicaManager)
[2019-01-11 17:16:46,314] INFO Stopping serving logs in dir /kafka/logs
(kafka.log.LogManager)
[2019-01-11 17:16:46,326] ERROR Shutdown broker because all log dirs in
/kafka/logs have failed (kafka.log.LogManager)
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)