Petr Pchelko created KAFKA-7156:
-----------------------------------
Summary: Deleting topics with long names can bring all brokers to
unrecoverable state
Key: KAFKA-7156
URL: https://issues.apache.org/jira/browse/KAFKA-7156
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 1.1.0
Reporter: Petr Pchelko
Kafka limit for the topic name is 249 symbols, so creating a topic with a name
248 symbol long is possible. However, when deleting the topic, Kafka tries to
rename the data directory for the topic to add some hash and `-deleted` in the
data directory, so that the resulting file name exceeds the 255 symbol file
name limit in most of the Unix file systems. This provokes a
java.nio.file.FileSystemException which in turn immediately shuts down all the
brokers. Further attemts to restart the broker fail with the same exception.
The only way to resurrect the cluster is to manually delete the affected topic
from zookeeper and from the disk on all the broker machines.
Steps to reproduce:
(Note: delete.topic.enable=true must be set in the config)
{code:java}
> kafka-topics.sh --zookeeper localhost:2181 --create --topic
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
> --partitions 1 --replication-factor 1
> kafka-topics.sh --zookeeper localhost:2181 --delete --topic
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
{code}
After these 2 commands executed all the brokers where this topic is replicated
immediately shut down with the following logs:
{code:java}
ERROR Error while renaming dir for
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0
in log dir /tmp/kafka-logs (kafka.server.LogDirFailureChannel)
java.nio.file.FileSystemException:
/tmp/kafka-logs/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0
->
/tmp/kafka-logs/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0.093fd1e1728f438ea990cbad8a514b9f-delete:
File name too long
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:457)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
...
Suppressed: java.nio.file.FileSystemException:
/tmp/kafka-logs/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0
->
/tmp/kafka-logs/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0.093fd1e1728f438ea990cbad8a514b9f-delete:
File name too long
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:694)
... 23 more
[2018-07-12 13:34:45,847] INFO [ReplicaManager broker=0] Stopping serving
replicas in dir /tmp/kafka-logs (kafka.server.ReplicaManager)
[2018-07-12 13:34:45,848] INFO [ReplicaFetcherManager on broker 0] Removed
fetcher for partitions (kafka.server.ReplicaFetcherManager)
[2018-07-12 13:34:45,849] INFO [ReplicaAlterLogDirsManager on broker 0] Removed
fetcher for partitions (kafka.server.ReplicaAlterLogDirsManager)
[2018-07-12 13:34:45,851] INFO [ReplicaManager broker=0] Broker 0 stopped
fetcher for partitions and stopped moving logs for partitions because they
are in the failed log directory /tmp/kafka-logs. (kafka.server.ReplicaManager)
[2018-07-12 13:34:45,851] INFO Stopping serving logs in dir /tmp/kafka-logs
(kafka.log.LogManager)
[2018-07-12 13:34:45,854] ERROR Shutdown broker because all log dirs in
/tmp/kafka-logs have failed (kafka.log.LogManager)
[2018-07-12 13:34:46,264] WARN Exception causing close of session
0x1648e0b3ec80004 due to java.io.IOException: Connection reset by peer
(org.apache.zookeeper.server.NIOServerCnxn)
[2018-07-12 13:34:46,264] INFO Closed socket connection for client
/0:0:0:0:0:0:0:1:63972 which had sessionid 0x1648e0b3ec80004
(org.apache.zookeeper.server.NIOServerCnxn)
{code}
Note, that
{code:java}
[2018-07-12 13:34:45,854] ERROR Shutdown broker because all log dirs in
/tmp/kafka-logs have failed (kafka.log.LogManager){code}
is happening regardless whether the topic with a long name is the only one on
the broker or not.
Further attempts to restart the brokers fail with the same error until all the
mentions of the deleted topic is removed from Zookeeper and the files are
removed from the data directories on all the brokers.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)