[
https://issues.apache.org/jira/browse/KAFKA-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257169#comment-16257169
]
Ismael Juma commented on KAFKA-5007:
------------------------------------
0.11.0.2 includes a couple of deadlock fixes and, as you can see in
https://issues.apache.org/jira/browse/KAFKA-5721?focusedCommentId=16257162&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16257162,
deadlocks can cause a build-up of connections in CLOSE_WAIT status. The
announcement is going out very soon, but you can already find it in the mirrors:
http://apache.mirror.anlx.net/kafka/0.11.0.2/
> Kafka Replica Fetcher Thread- Resource Leak
> -------------------------------------------
>
> Key: KAFKA-5007
> URL: https://issues.apache.org/jira/browse/KAFKA-5007
> Project: Kafka
> Issue Type: Bug
> Components: core, network
> Affects Versions: 0.10.0.0, 0.10.1.1, 0.10.2.0
> Environment: Centos 7
> Jave 8
> Reporter: Joseph Aliase
> Priority: Critical
> Labels: reliability
> Attachments: jstack-kafka.out, jstack-zoo.out, lsofkafka.txt,
> lsofzookeeper.txt
>
>
> Kafka is running out of open file descriptor when system network interface is
> done.
> Issue description:
> We have a Kafka Cluster of 5 node running on version 0.10.1.1. The open file
> descriptor for the account running Kafka is set to 100000.
> During an upgrade, network interface went down. Outage continued for 12 hours
> eventually all the broker crashed with java.io.IOException: Too many open
> files error.
> We repeated the test in a lower environment and observed that Open Socket
> count keeps on increasing while the NIC is down.
> We have around 13 topics with max partition size of 120 and number of replica
> fetcher thread is set to 8.
> Using an internal monitoring tool we observed that Open Socket descriptor
> for the broker pid continued to increase although NIC was down leading to
> Open File descriptor error.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)