Hello.
I wonder if anyone can assist with this problem, reported by one of
our customers.
The load balancer is running HAProxy 1.4-rc1, with a modified version
of the HTTP ECV patch applied. The customer is using ECV to check the
status of a pair of IIS web servers:
listen web 10.3.4.150:80
mode tcp
option httpchk GET /gccheck.cfm HTTP/1.0
http-check expect rstring gCokay
balance source
server realserver1 10.4.1.6:80 weight 1 check inter 2000 rise 2 fall 3
server realserver2 10.4.1.16:80 weight 1 check inter 2000 rise 2 fall 3
server backup 127.0.0.1:9081 backup
option redispatch
option abortonclose
maxconn 40000
log global
We are seeing both real servers repeatedly going on- and off-line with
a period of tens of seconds. Packet tracing, stracing, and adding
debug code to HAProxy itself has revealed that the real servers are
always responding correctly, but HAProxy is sometimes receiving only
part of the response.
It appears that the real servers are sending the test page as three
separate packets. HAProxy receives the contents of one, two, or three
packets, apparently randomly. Naturally, the health check only
succeeds when all three packets' data are seen by HAProxy. If HAProxy
and the real servers are modified to use a plain HTML page for the
health check, the response is in the form of a single packet and the
checks do not fail.
Attached is an example packet trace, firstly a failed check, then a
successful one. 10.3.4.90 is the load balancer, 10.4.1.6 is the real
server. Wireshark does not seem to recognise the responses as HTTP -
they are shown as [TCP segment of a reassembled PDU].
In the first check, the load balancer appears to close the connection
early, in frame 9, after receiving only one of the response packets.
In the second check, all three packets are received before the
connection is closed. strace and dumping data from within HAProxy
shows that data is discarded after the connection is closed. The other
attached file shows a fragment from stracing HAProxy - here, the first
request is successful, the second is truncated. (I've elided customer
details in the response.)
This appears not to be an HAProxy problem, as it is not receiving the
full real server response. Any ideas for further testing I can
perform?
Thanks,
Nick.
--
Nick Chalk.
Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/
No. Time Source Destination Protocol Info
3 0.066490 10.3.4.90 10.4.1.6 TCP 60455
> http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=355055022 TSER=0 WS=6
4 0.066944 10.4.1.6 10.3.4.90 TCP http >
60455 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=0 TSV=0 TSER=0
5 0.066970 10.3.4.90 10.4.1.6 TCP 60455
> http [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=355055022 TSER=0
6 0.067000 10.3.4.90 10.4.1.6 HTTP GET
//gccheck.cfm HTTP/1.0
7 0.069538 10.4.1.6 10.3.4.90 TCP [TCP
segment of a reassembled PDU]
8 0.069551 10.3.4.90 10.4.1.6 TCP 60455
> http [ACK] Seq=31 Ack=124 Win=5888 Len=0 TSV=355055025 TSER=26878008
9 0.069597 10.3.4.90 10.4.1.6 TCP 60455
> http [FIN, ACK] Seq=31 Ack=124 Win=5888 Len=0 TSV=355055025 TSER=26878008
10 0.069638 10.4.1.6 10.3.4.90 TCP [TCP
segment of a reassembled PDU]
11 0.069653 10.3.4.90 10.4.1.6 TCP 60455
> http [RST] Seq=31 Win=0 Len=0
12 0.069656 10.4.1.6 10.3.4.90 TCP [TCP
segment of a reassembled PDU]
13 0.069661 10.3.4.90 10.4.1.6 TCP 60455
> http [RST] Seq=31 Win=0 Len=0
14 0.069663 10.4.1.6 10.3.4.90 TCP http >
60455 [FIN, ACK] Seq=424 Ack=31 Win=65505 Len=0 TSV=26878008 TSER=355055022
15 0.069668 10.3.4.90 10.4.1.6 TCP 60455
> http [RST] Seq=31 Win=0 Len=0
16 0.069887 10.4.1.6 10.3.4.90 TCP http >
60455 [ACK] Seq=425 Ack=32 Win=65505 Len=0 TSV=26878008 TSER=355055025
17 0.069894 10.3.4.90 10.4.1.6 TCP 60455
> http [RST] Seq=32 Win=0 Len=0
132 2.070376 10.3.4.90 10.4.1.6 TCP 60469
> http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=355057026 TSER=0 WS=6
133 2.070837 10.4.1.6 10.3.4.90 TCP http >
60469 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=0 TSV=0 TSER=0
134 2.070849 10.3.4.90 10.4.1.6 TCP 60469
> http [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=355057026 TSER=0
135 2.070866 10.3.4.90 10.4.1.6 HTTP GET
//gccheck.cfm HTTP/1.0
136 2.073582 10.4.1.6 10.3.4.90 TCP [TCP
segment of a reassembled PDU]
137 2.073593 10.3.4.90 10.4.1.6 TCP 60469
> http [ACK] Seq=31 Ack=124 Win=5888 Len=0 TSV=355057029 TSER=26878029
138 2.073595 10.4.1.6 10.3.4.90 TCP [TCP
segment of a reassembled PDU]
139 2.073600 10.3.4.90 10.4.1.6 TCP 60469
> http [ACK] Seq=31 Ack=369 Win=6912 Len=0 TSV=355057029 TSER=26878029
140 2.073603 10.4.1.6 10.3.4.90 TCP [TCP
segment of a reassembled PDU]
141 2.073608 10.3.4.90 10.4.1.6 TCP 60469
> http [ACK] Seq=31 Ack=424 Win=6912 Len=0 TSV=355057029 TSER=26878029
142 2.073653 10.3.4.90 10.4.1.6 TCP 60469
> http [FIN, ACK] Seq=31 Ack=424 Win=6912 Len=0 TSV=355057029 TSER=26878029
143 2.073683 10.4.1.6 10.3.4.90 TCP http >
60469 [FIN, ACK] Seq=424 Ack=31 Win=65505 Len=0 TSV=26878029 TSER=355057026
144 2.073692 10.3.4.90 10.4.1.6 TCP 60469
> http [ACK] Seq=32 Ack=425 Win=6912 Len=0 TSV=355057029 TSER=26878029
145 2.073932 10.4.1.6 10.3.4.90 TCP http >
60469 [ACK] Seq=425 Ack=32 Win=65505 Len=0 TSV=26878029 TSER=355057029
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 8
fcntl(8, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0
connect(8, {sa_family=AF_INET, sin_port=htons(80),
sin_addr=inet_addr("10.4.1.6")}, 16) = -1 EINPROGRESS (Operation now in
progress)
sendto(8, "GET /gccheck.cfm HTTP/1.0\r\n\r\n", 29, MSG_DONTWAIT|MSG_NOSIGNAL,
NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(3, EPOLL_CTL_ADD, 8, {EPOLLOUT, {u32=8, u64=8}}) = 0
accept(7, 0x7fff72ffb770, [29422312683995264]) = -1 EAGAIN (Resource
temporarily unavailable)
epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0
accept(6, 0x7fff72ffb660, [30493168289972352]) = -1 EAGAIN (Resource
temporarily unavailable)
epoll_ctl(3, EPOLL_CTL_ADD, 6, {EPOLLIN, {u32=6, u64=6}}) = 0
accept(4, 0x7fff72ffb660, [30325492766736512]) = -1 EAGAIN (Resource
temporarily unavailable)
epoll_ctl(3, EPOLL_CTL_ADD, 4, {EPOLLIN, {u32=4, u64=4}}) = 0
epoll_wait(3, {{EPOLLOUT, {u32=8, u64=8}}}, 9, 993) = 1
sendto(8, "GET /gccheck.cfm HTTP/1.0\r\n\r\n", 29, MSG_DONTWAIT|MSG_NOSIGNAL,
NULL, 0) = 29
recvfrom(8, 0x69f800, 16384, 256, 0, 0) = -1 EAGAIN (Resource temporarily
unavailable)
epoll_ctl(3, EPOLL_CTL_MOD, 8, {EPOLLIN, {u32=8, u64=8}}) = 0
epoll_wait(3, {{EPOLLIN, {u32=8, u64=8}}}, 9, 992) = 1
recvfrom(8, "HTTP/1.1 200 OK\r\nConnection: close\r\nDate: Wed, 10 Feb 2010
13:10:04 GMT\r\nServer: Microsoft-IIS/6.0\r\nX-Powered-By:
ASP.NET\r\nContent-type: text/html\r\nPage-Completion-Status:
Normal\r\nPage-Completion-Status: Normal\r\nSet-Cookie: CFID=8654065;
expires=Sun, 27-Sep-2037 00:00:00 GMT; path=/; \r\nSet-Cookie:
CFTOKEN=85114548; expires=Sun, 27-Sep-2037 00:00:00 GMT; path=/; \r\n\r\n
\r\n\r\n</head>\r\n\r\n<body lang=EN-US
style='tab-interval:36.0pt'>\r\n\r\n<div class=Section1>\r\n\r\n<p
class=MsoNormal><span class=GramE><span
lang=EN-GB>..........</span></span><span\r\nlang=EN-GB> test page
</span></p>\r\n\r\n<p class=MsoNormal><span
lang=EN-GB><o:p> </o:p></span></p>\r\n\r\n<p class=MsoNormal><span
class=SpellE><span class=GramE><span
lang=EN-GB>gCokay</span></span></span><span\r\nlang=EN-GB>
</span></p>\r\n\r\n<p class=MsoNormal><span
lang=EN-GB><o:p> </o:p></span></p>\r\n\r\n<p class=MsoNormal><span
lang=EN-GB>website
okay</span></p>\r\n\r\n</div>\r\n\r\n\r\n</body>\r\n\r\n</html>\r\n", 16384,
MSG_WAITALL, NULL, NULL) = 944
close(8) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 8
fcntl(8, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0
connect(8, {sa_family=AF_INET, sin_port=htons(80),
sin_addr=inet_addr("10.4.1.6")}, 16) = -1 EINPROGRESS (Operation now in
progress)
sendto(8, "GET /gccheck.cfm HTTP/1.0\r\n\r\n", 29, MSG_DONTWAIT|MSG_NOSIGNAL,
NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(3, EPOLL_CTL_ADD, 8, {EPOLLOUT, {u32=8, u64=8}}) = 0
epoll_wait(3, {{EPOLLOUT, {u32=8, u64=8}}}, 9, 988) = 1
sendto(8, "GET /gccheck.cfm HTTP/1.0\r\n\r\n", 29, MSG_DONTWAIT|MSG_NOSIGNAL,
NULL, 0) = 29
recvfrom(8, 0x69f800, 16384, 256, 0, 0) = -1 EAGAIN (Resource temporarily
unavailable)
epoll_ctl(3, EPOLL_CTL_MOD, 8, {EPOLLIN, {u32=8, u64=8}}) = 0
epoll_wait(3, {{EPOLLIN, {u32=8, u64=8}}}, 9, 988) = 1
recvfrom(8, "HTTP/1.1 200 OK\r\nConnection: close\r\nDate: Wed, 10 Feb 2010
13:10:06 GMT\r\nServer: Microsoft-IIS/6.0\r\nX-Powered-By:
ASP.NET\r\nContent-type: text/html\r\nPage-Completion-Status:
Normal\r\nPage-Completion-Status: Normal\r\nSet-Cookie: CFID=8654066;
expires=Sun, 27-Sep-2037 00:00:00 GMT; path=/; \r\nSet-Cookie:
CFTOKEN=47946190; expires=Sun, 27-Sep-2037 00:00:00 GMT; path=/; \r\n\r\n",
16384, MSG_WAITALL, NULL, NULL) = 368
close(8) = 0