Hi Lukas,

OK it's a kernel issue on my reverse proxy. Look below, haproxy detected a
timeout after 30s of idle :

(fd 14 faces the client, fd 15 the server)

epoll_wait(0, 0x1aebdd8, 0xc8, 0)       = 0
gettimeofday({1371978241, 119519}, NULL) = 0
recv(15, 
"-\nR\216+f\213%G\3539\"\270\246{9\3037\272\317N\215\0\226\333;\334\320y\374Z.'"...,
 8030, 0) = 8030
send(14, 
"-\nR\216+f\213%G\3539\"\270\246{9\3037\272\317N\215\0\226\333;\334\320y\374Z.'"...,
 8030, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE) = 4280
gettimeofday({1371978241, 120296}, NULL) = 0
epoll_wait(0, 0x1aebdd8, 0xc8, 0)       = 0
gettimeofday({1371978241, 120635}, NULL) = 0
recv(15, "A\23,A\17\221\234k\271!\313C\245\267a 
Pp\316\204-9\342E\360\3438\255\322\247(-J"..., 4280, 0) = 4280
send(14, 
"i.\302#t\35\300\354~G\312\2606\266\201\376\254}~\362\372l_\226\31\5\210{\344\361\10`\30"...,
 3750, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE) = -1 EAGAIN (Resource temporarily 
unavailable)
epoll_ctl(0, 0x3, 0xe, 0xa2028)         = 0

==> buffers are full for fd #14. Nothing happens on this FD for the
    next 30 seconds, until we decide it's over and close the connection :

epoll_wait(0, 0x1aebdd8, 0xc8, 0x93)    = 0
gettimeofday({1371978271, 122894}, NULL) = 0
setsockopt(14, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
close(14)                               = 0
shutdown(15, 1 /* send */)              = 0
close(15)                               = 0
sendto(10, "<134>Jun 23 11:04:31 haproxy[1153"..., 318, 
MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(514), 
sin_addr=inet_addr("10.8.1.2")}, 16) = 318


Now the problem is that the capture taken on the same side shows a different
story : this happens after the recovery from some losses :

10:03:47.918722 88.191.124.161.45154 > 62.212.114.60.81: . [tcp sum ok] 
3500432327:3500432327(0) ack 3270212996 win 398 <nop,nop,timestamp 2247579400 
21913153> (DF) (ttl 53, id 27788, len 52)
10:03:48.001500 62.212.114.60.81 > 88.191.124.161.45154: . [tcp sum ok] 
3270212996:3270213496(500) ack 3500432327 win 1500 <nop,nop,timestamp 21913156 
2247578896> (DF) (ttl 128, id 24160, len 552)
10:03:48.025513 62.212.114.60.81 > 88.191.124.161.45154: . [tcp sum ok] 
3270213496:3270213996(500) ack 3500432327 win 1500 <nop,nop,timestamp 21913156 
2247578896> (DF) (ttl 128, id 24161, len 552)
10:03:48.063247 88.191.124.161.45154 > 62.212.114.60.81: . [tcp sum ok] 
3500432327:3500432327(0) ack 3270213496 win 405 <nop,nop,timestamp 2247579436 
21913156> (DF) (ttl 53, id 27789, len 52)
10:03:48.086935 88.191.124.161.45154 > 62.212.114.60.81: . [tcp sum ok] 
3500432327:3500432327(0) ack 3270213996 win 413 <nop,nop,timestamp 2247579442 
21913156> (DF) (ttl 53, id 27790, len 52)
10:03:48.344722 62.212.114.60.81 > 88.191.124.161.45154: R [tcp sum ok] 
3270224496:3270224496(0) ack 3500432327 win 1500 <nop,nop,timestamp 21913400 
2247579442> (DF) (ttl 128, id 24183, len 52)

When the RST happens, all bytes were acked, so for sure the writes should
have retriggerred. It's probably time to upgrade this kernel. Now that I'm
thinking about it, I believe that the issues started when I switched to use
this machine :-/

Best regards,
Willy


Reply via email to