Hello. I wonder if anyone can assist with this problem, reported by one of our customers.
The load balancer is running HAProxy 1.4-rc1, with a modified version of the HTTP ECV patch applied. The customer is using ECV to check the status of a pair of IIS web servers: listen web 10.3.4.150:80 mode tcp option httpchk GET /gccheck.cfm HTTP/1.0 http-check expect rstring gCokay balance source server realserver1 10.4.1.6:80 weight 1 check inter 2000 rise 2 fall 3 server realserver2 10.4.1.16:80 weight 1 check inter 2000 rise 2 fall 3 server backup 127.0.0.1:9081 backup option redispatch option abortonclose maxconn 40000 log global We are seeing both real servers repeatedly going on- and off-line with a period of tens of seconds. Packet tracing, stracing, and adding debug code to HAProxy itself has revealed that the real servers are always responding correctly, but HAProxy is sometimes receiving only part of the response. It appears that the real servers are sending the test page as three separate packets. HAProxy receives the contents of one, two, or three packets, apparently randomly. Naturally, the health check only succeeds when all three packets' data are seen by HAProxy. If HAProxy and the real servers are modified to use a plain HTML page for the health check, the response is in the form of a single packet and the checks do not fail. Attached is an example packet trace, firstly a failed check, then a successful one. 10.3.4.90 is the load balancer, 10.4.1.6 is the real server. Wireshark does not seem to recognise the responses as HTTP - they are shown as [TCP segment of a reassembled PDU]. In the first check, the load balancer appears to close the connection early, in frame 9, after receiving only one of the response packets. In the second check, all three packets are received before the connection is closed. strace and dumping data from within HAProxy shows that data is discarded after the connection is closed. The other attached file shows a fragment from stracing HAProxy - here, the first request is successful, the second is truncated. (I've elided customer details in the response.) This appears not to be an HAProxy problem, as it is not receiving the full real server response. Any ideas for further testing I can perform? Thanks, Nick. -- Nick Chalk. Loadbalancer.org Ltd. Phone: +44 (0)870 443 8779 http://www.loadbalancer.org/
No. Time Source Destination Protocol Info 3 0.066490 10.3.4.90 10.4.1.6 TCP 60455 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=355055022 TSER=0 WS=6 4 0.066944 10.4.1.6 10.3.4.90 TCP http > 60455 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=0 TSV=0 TSER=0 5 0.066970 10.3.4.90 10.4.1.6 TCP 60455 > http [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=355055022 TSER=0 6 0.067000 10.3.4.90 10.4.1.6 HTTP GET //gccheck.cfm HTTP/1.0 7 0.069538 10.4.1.6 10.3.4.90 TCP [TCP segment of a reassembled PDU] 8 0.069551 10.3.4.90 10.4.1.6 TCP 60455 > http [ACK] Seq=31 Ack=124 Win=5888 Len=0 TSV=355055025 TSER=26878008 9 0.069597 10.3.4.90 10.4.1.6 TCP 60455 > http [FIN, ACK] Seq=31 Ack=124 Win=5888 Len=0 TSV=355055025 TSER=26878008 10 0.069638 10.4.1.6 10.3.4.90 TCP [TCP segment of a reassembled PDU] 11 0.069653 10.3.4.90 10.4.1.6 TCP 60455 > http [RST] Seq=31 Win=0 Len=0 12 0.069656 10.4.1.6 10.3.4.90 TCP [TCP segment of a reassembled PDU] 13 0.069661 10.3.4.90 10.4.1.6 TCP 60455 > http [RST] Seq=31 Win=0 Len=0 14 0.069663 10.4.1.6 10.3.4.90 TCP http > 60455 [FIN, ACK] Seq=424 Ack=31 Win=65505 Len=0 TSV=26878008 TSER=355055022 15 0.069668 10.3.4.90 10.4.1.6 TCP 60455 > http [RST] Seq=31 Win=0 Len=0 16 0.069887 10.4.1.6 10.3.4.90 TCP http > 60455 [ACK] Seq=425 Ack=32 Win=65505 Len=0 TSV=26878008 TSER=355055025 17 0.069894 10.3.4.90 10.4.1.6 TCP 60455 > http [RST] Seq=32 Win=0 Len=0 132 2.070376 10.3.4.90 10.4.1.6 TCP 60469 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=355057026 TSER=0 WS=6 133 2.070837 10.4.1.6 10.3.4.90 TCP http > 60469 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=0 TSV=0 TSER=0 134 2.070849 10.3.4.90 10.4.1.6 TCP 60469 > http [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=355057026 TSER=0 135 2.070866 10.3.4.90 10.4.1.6 HTTP GET //gccheck.cfm HTTP/1.0 136 2.073582 10.4.1.6 10.3.4.90 TCP [TCP segment of a reassembled PDU] 137 2.073593 10.3.4.90 10.4.1.6 TCP 60469 > http [ACK] Seq=31 Ack=124 Win=5888 Len=0 TSV=355057029 TSER=26878029 138 2.073595 10.4.1.6 10.3.4.90 TCP [TCP segment of a reassembled PDU] 139 2.073600 10.3.4.90 10.4.1.6 TCP 60469 > http [ACK] Seq=31 Ack=369 Win=6912 Len=0 TSV=355057029 TSER=26878029 140 2.073603 10.4.1.6 10.3.4.90 TCP [TCP segment of a reassembled PDU] 141 2.073608 10.3.4.90 10.4.1.6 TCP 60469 > http [ACK] Seq=31 Ack=424 Win=6912 Len=0 TSV=355057029 TSER=26878029 142 2.073653 10.3.4.90 10.4.1.6 TCP 60469 > http [FIN, ACK] Seq=31 Ack=424 Win=6912 Len=0 TSV=355057029 TSER=26878029 143 2.073683 10.4.1.6 10.3.4.90 TCP http > 60469 [FIN, ACK] Seq=424 Ack=31 Win=65505 Len=0 TSV=26878029 TSER=355057026 144 2.073692 10.3.4.90 10.4.1.6 TCP 60469 > http [ACK] Seq=32 Ack=425 Win=6912 Len=0 TSV=355057029 TSER=26878029 145 2.073932 10.4.1.6 10.3.4.90 TCP http > 60469 [ACK] Seq=425 Ack=32 Win=65505 Len=0 TSV=26878029 TSER=355057029
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 8 fcntl(8, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0 connect(8, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("10.4.1.6")}, 16) = -1 EINPROGRESS (Operation now in progress) sendto(8, "GET /gccheck.cfm HTTP/1.0\r\n\r\n", 29, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(3, EPOLL_CTL_ADD, 8, {EPOLLOUT, {u32=8, u64=8}}) = 0 accept(7, 0x7fff72ffb770, [29422312683995264]) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0 accept(6, 0x7fff72ffb660, [30493168289972352]) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(3, EPOLL_CTL_ADD, 6, {EPOLLIN, {u32=6, u64=6}}) = 0 accept(4, 0x7fff72ffb660, [30325492766736512]) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(3, EPOLL_CTL_ADD, 4, {EPOLLIN, {u32=4, u64=4}}) = 0 epoll_wait(3, {{EPOLLOUT, {u32=8, u64=8}}}, 9, 993) = 1 sendto(8, "GET /gccheck.cfm HTTP/1.0\r\n\r\n", 29, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 29 recvfrom(8, 0x69f800, 16384, 256, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(3, EPOLL_CTL_MOD, 8, {EPOLLIN, {u32=8, u64=8}}) = 0 epoll_wait(3, {{EPOLLIN, {u32=8, u64=8}}}, 9, 992) = 1 recvfrom(8, "HTTP/1.1 200 OK\r\nConnection: close\r\nDate: Wed, 10 Feb 2010 13:10:04 GMT\r\nServer: Microsoft-IIS/6.0\r\nX-Powered-By: ASP.NET\r\nContent-type: text/html\r\nPage-Completion-Status: Normal\r\nPage-Completion-Status: Normal\r\nSet-Cookie: CFID=8654065; expires=Sun, 27-Sep-2037 00:00:00 GMT; path=/; \r\nSet-Cookie: CFTOKEN=85114548; expires=Sun, 27-Sep-2037 00:00:00 GMT; path=/; \r\n\r\n \r\n\r\n</head>\r\n\r\n<body lang=EN-US style='tab-interval:36.0pt'>\r\n\r\n<div class=Section1>\r\n\r\n<p class=MsoNormal><span class=GramE><span lang=EN-GB>..........</span></span><span\r\nlang=EN-GB> test page </span></p>\r\n\r\n<p class=MsoNormal><span lang=EN-GB><o:p> </o:p></span></p>\r\n\r\n<p class=MsoNormal><span class=SpellE><span class=GramE><span lang=EN-GB>gCokay</span></span></span><span\r\nlang=EN-GB> </span></p>\r\n\r\n<p class=MsoNormal><span lang=EN-GB><o:p> </o:p></span></p>\r\n\r\n<p class=MsoNormal><span lang=EN-GB>website okay</span></p>\r\n\r\n</div>\r\n\r\n\r\n</body>\r\n\r\n</html>\r\n", 16384, MSG_WAITALL, NULL, NULL) = 944 close(8) = 0 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 8 fcntl(8, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0 connect(8, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("10.4.1.6")}, 16) = -1 EINPROGRESS (Operation now in progress) sendto(8, "GET /gccheck.cfm HTTP/1.0\r\n\r\n", 29, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(3, EPOLL_CTL_ADD, 8, {EPOLLOUT, {u32=8, u64=8}}) = 0 epoll_wait(3, {{EPOLLOUT, {u32=8, u64=8}}}, 9, 988) = 1 sendto(8, "GET /gccheck.cfm HTTP/1.0\r\n\r\n", 29, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 29 recvfrom(8, 0x69f800, 16384, 256, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(3, EPOLL_CTL_MOD, 8, {EPOLLIN, {u32=8, u64=8}}) = 0 epoll_wait(3, {{EPOLLIN, {u32=8, u64=8}}}, 9, 988) = 1 recvfrom(8, "HTTP/1.1 200 OK\r\nConnection: close\r\nDate: Wed, 10 Feb 2010 13:10:06 GMT\r\nServer: Microsoft-IIS/6.0\r\nX-Powered-By: ASP.NET\r\nContent-type: text/html\r\nPage-Completion-Status: Normal\r\nPage-Completion-Status: Normal\r\nSet-Cookie: CFID=8654066; expires=Sun, 27-Sep-2037 00:00:00 GMT; path=/; \r\nSet-Cookie: CFTOKEN=47946190; expires=Sun, 27-Sep-2037 00:00:00 GMT; path=/; \r\n\r\n", 16384, MSG_WAITALL, NULL, NULL) = 368 close(8) = 0