[ 
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053195#comment-17053195
 ] 

li xiangyuan commented on KAFKA-9648:
-------------------------------------

[~junrao]: Yes 1412 is quite small, I have noticed that but couldn't understand 
why. In tcpdump file it shows after connection established, server's windows 
size suddenly drop down from xxxxx to 394, and producer's window size keep 
31088.  I think it's beyond my ability & knowledge to explain the reason ...

> kafka server should resize backlog when create serversocket
> -----------------------------------------------------------
>
>                 Key: KAFKA-9648
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9648
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.10.0.1
>            Reporter: li xiangyuan
>            Priority: Minor
>
> I have describe a mystery problem 
> (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
> server will trigger tcp Congestion Control in some condition. finally we 
> found the root cause.
> when kafka server restart for any reason and then execute preferred replica 
> leader, lots of replica leader will give back to it & trigger cluster 
> metadata update. then all clients will establish connection to this server. 
> at the monment many tcp estable request are waiting in the tcp sync queue , 
> and then to accept queue. 
> kafka create serversocket in SocketServer.scala 
>  
> {code:java}
> serverChannel.socket.bind(socketAddress);{code}
> this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
> will decide the queue length.beacues kafka haven't set ,it is default value 
> 50.
> if this queue is full, and tcp_syncookies = 0, then new connection request 
> will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie 
> mechanism. this mechanism could allow linux handle more tcp sync request, but 
> it would lose many tcp external parameter, include "wscale", the one that 
> allow tcp connection to send much more bytes per tcp package. because 
> syncookie triggerd, wscale has lost, and this tcp connection will handle 
> network very slow, forever,until this connection is closed and establish 
> another tcp connection.
> so after a preferred repilca executed, lots of new tcp connection will 
> establish without set wscale,and many network traffic to this server will 
> have a very slow speed.
> i'm not sure whether new linux version have resolved this problem, but kafka 
> also should set backlog a larger value. we now have modify this to 512, seems 
> everything is ok.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to