[jira] [Comment Edited] (KAFKA-6529) Broker leaks memory and file descriptors after sudden client disconnects

zhangzhanchang (Jira) Sat, 04 Apr 2020 08:10:08 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17075166#comment-17075166
 ]


zhangzhanchang edited comment on KAFKA-6529 at 4/4/20, 3:09 PM:
----------------------------------------------------------------

Thanks for your comments. When I run /usr/sbin/lsof -p on that process, I see a 
lot of "can't identify protocol",Does it mean that there are many file 
description leaks in broker? If I don't use librdkafka, will it not trigger the 
problem？I used the Java version of Kafka client to test it many times and i 
didn't find this problem。


was (Author: zzccctv):
Thanks for you comments. When I run /usr/sbin/lsof -p on that process, I see a 
lot of "can't identify protocol",Does it mean that there are many file 
description leaks in broker? If I don't use librdkafka, will it not trigger the 
problem？I used the Java version of Kafka client to test it many times and i 
didn't find this problem。

> Broker leaks memory and file descriptors after sudden client disconnects
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-6529
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6529
>             Project: Kafka
>          Issue Type: Bug
>          Components: network
>    Affects Versions: 0.11.0.2, 1.0.0
>            Reporter: Graham Campbell
>            Priority: Major
>             Fix For: 0.11.0.3, 1.0.1, 1.1.0
>
>
> If a producer forcefully disconnects from a broker while it has staged 
> receives, that connection enters a limbo state where it is no longer 
> processed by the SocketServer.Processor, leaking the file descriptor for the 
> socket and the memory used for the staged recieve queue for that connection.
> We noticed this during an upgrade from 0.9.0.2 to 0.11.0.2. Immediately after 
> the rolling restart to upgrade, open file descriptors on the brokers started 
> climbing uncontrollably. In a few cases brokers reached our configured max 
> open files limit of 100k and crashed before we rolled back.
> We tracked this down to a buildup of muted connections in the 
> Selector.closingChannels list. If a client disconnects from the broker with 
> multiple pending produce requests, when the broker attempts to send an ack to 
> the client it recieves an IOException because the TCP socket has been closed. 
> This triggers the Selector to close the channel, but because it still has 
> pending requests, it adds it to Selector.closingChannels to process those 
> requests. However, because that exception was triggered by trying to send a 
> response, the SocketServer.Processor has marked the channel as muted and will 
> no longer process it at all.
> *Reproduced by:*
> Starting a Kafka broker/cluster
> Client produces several messages and then disconnects abruptly (eg. 
> _./rdkafka_performance -P -x 100 -b broker:9092 -t test_topic_)
> Broker then leaks file descriptor previously used for TCP socket and memory 
> for unprocessed messages
> *Proposed solution (which we've implemented internally)*
> Whenever an exception is encountered when writing to a socket in 
> Selector.pollSelectionKeys(...) record that that connection failed a send by 
> adding the KafkaChannel ID to Selector.failedSends. Then re-raise the 
> exception to still trigger the socket disconnection logic. Since every 
> exception raised in this function triggers a disconnect, we also treat any 
> exception while writing to the socket as a failed send.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (KAFKA-6529) Broker leaks memory and file descriptors after sudden client disconnects

Reply via email to