[ 
https://issues.apache.org/jira/browse/KAFKA-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320671#comment-17320671
 ] 

GeoffreyStark commented on KAFKA-8714:
--------------------------------------

maybe the same issue I created  

https://issues.apache.org/jira/browse/KAFKA-12665[https://issues.apache.org/jira/browse/KAFKA-12665]

> CLOSE_WAIT connections piling up on the broker
> ----------------------------------------------
>
>                 Key: KAFKA-8714
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8714
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.1.0, 2.3.0
>         Environment: Linux 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
> 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Rajdeep Mukherjee
>            Priority: Major
>         Attachments: Screenshot from 2019-07-25 11-53-24.png, 
> consumer_multiprocessing.py, producer_multiprocessing.py
>
>
> We are experiencing an issue where `CLOSE_WAIT` connections are piling up in 
> the brokers leading to a `Too many open files` error finally leading to a 
> crash of the corresponding broker. After some digging, we realized that this 
> is happening at instances when multiple clients(producers or consumers) are 
> closing their connections within a brief interval of time(when the frequency 
> of client connection closes is increasing). 
> The actual error that we had encountered was:
> {code:java}
> [2019-07-18 00:03:27,861] ERROR Error while accepting connection
> (kafka.network.Acceptor) java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) 
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) 
> at kafka.network.Acceptor.accept(SocketServer.scala:326)
> at kafka.network.Acceptor.run(SocketServer.scala:269)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> When the error was encountered, the number of CLOSE_WAIT connections on the 
> broker was 200,000 and the number of ESTABLISHED connections was 
> approximately 15000.
> The attachment shows the issue, the sharp dip in the graph is the point where 
> the broker restarted.
> We had encountered this problem in both kafka version 0.10.1 and 2.3.0
> The client version we were using for reproducing was:
>  
> {code:java}
> confluent-kafka==1.1.0
> librdkafka v1.1.0
> {code}
>  
> Steps to reproduce:
> I have attached the scripts we used for reproducing the issue. 
> In our qa environment we were successfully able to reproduce the issue in the 
> following way:
>  * we spun a 5 node kafka v2.3.0 cluster
>  * we had prepared a python script that would spin in the order of 500+ 
> producer processes and the same number of consumer processes and we had 
> written in logic to randomly close the producer and consumer connections at a 
> high frequency in the order of 10 closes per second for 5 minutes.
>  * On the broker side, we were watching for CLOSE_WAIT connections using 
> `lsof` and we got sustained CLOSE_WAIT connections that persisted until we 
> restarted kafka on the corresponding broker.
> The command to be run for the producer and consumer scripts are:
> {code:java}
> python producer_multiprocessing.py <time in seconds> <number of processes 
> <sleep in seconds between produce> true true
> python consumer_multiprocessing.py <time in seconds> <number of processes> 0 
> true
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to