Allen Wang created KAFKA-2096:
---------------------------------

             Summary: Enable keepalive socket option for broker
                 Key: KAFKA-2096
                 URL: https://issues.apache.org/jira/browse/KAFKA-2096
             Project: Kafka
          Issue Type: Improvement
          Components: network
    Affects Versions: 0.8.2.1
            Reporter: Allen Wang
            Assignee: Jun Rao
            Priority: Critical


We run a Kafka 0.8.2.1 cluster in AWS with large number of producers (> 10000). 
Also the number of producer instances scale up and down significantly on a 
daily basis.

The issue we found is that after 10 days, the open file descriptor count will 
approach the limit of 32K. An investigation of these open file descriptors 
shows that a significant portion of these are from client instances that are 
terminated during scaling down. Somehow they still show as "ESTABLISHED" in 
netstat. We suspect that the AWS firewall between the client and broker causes 
this issue.

We attempted to use "keepalive" socket option to reduce this socket leak on 
broker and it appears to be working. Specifically, we added this line to 
kafka.network.Acceptor.accept():

      socketChannel.socket().setKeepAlive(true)

It is confirmed during our experiment of this change that entries in netstat 
where the client instance is terminated were probed as configured in operating 
system. After configured number of probes, the OS determined that the peer is 
no longer alive and the entry is removed, possibly after an error in Kafka to 
read from the channel and closing the channel. Also, our experiment shows that 
after a few days, the instance was able to keep a stable low point of open file 
descriptor count, compared with other instances where the low point keeps 
increasing day to day.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to