Hello.

I wonder if anyone can assist with this problem, reported by one of
our customers.

The load balancer is running HAProxy 1.4-rc1, with a modified version
of the HTTP ECV patch applied. The customer is using ECV to check the
status of a pair of IIS web servers:

listen web 10.3.4.150:80
        mode tcp
        option httpchk GET /gccheck.cfm HTTP/1.0
        http-check expect rstring gCokay
        balance source
        server realserver1 10.4.1.6:80 weight 1 check  inter 2000 rise 2 fall 3
        server realserver2 10.4.1.16:80 weight 1 check  inter 2000 rise 2 fall 3
        server backup 127.0.0.1:9081 backup
        option redispatch
        option abortonclose
        maxconn 40000
        log global

We are seeing both real servers repeatedly going on- and off-line with
a period of tens of seconds. Packet tracing, stracing, and adding
debug code to HAProxy itself has revealed that the real servers are
always responding correctly, but HAProxy is sometimes receiving only
part of the response.

It appears that the real servers are sending the test page as three
separate packets. HAProxy receives the contents of one, two, or three
packets, apparently randomly. Naturally, the health check only
succeeds when all three packets' data are seen by HAProxy. If HAProxy
and the real servers are modified to use a plain HTML page for the
health check, the response is in the form of a single packet and the
checks do not fail.

Attached is an example packet trace, firstly a failed check, then a
successful one. 10.3.4.90 is the load balancer, 10.4.1.6 is the real
server. Wireshark does not seem to recognise the responses as HTTP -
they are shown as [TCP segment of a reassembled PDU].

In the first check, the load balancer appears to close the connection
early, in frame 9, after receiving only one of the response packets.
In the second check, all three packets are received before the
connection is closed. strace and dumping data from within HAProxy
shows that data is discarded after the connection is closed. The other
attached file shows a fragment from stracing HAProxy - here, the first
request is successful, the second is truncated. (I've elided customer
details in the response.)

This appears not to be an HAProxy problem, as it is not receiving the
full real server response. Any ideas for further testing I can
perform?

Thanks,
Nick.

-- 
Nick Chalk.

Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/
No.     Time        Source                Destination           Protocol Info
      3 0.066490    10.3.4.90             10.4.1.6              TCP      60455 
> http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=355055022 TSER=0 WS=6
      4 0.066944    10.4.1.6              10.3.4.90             TCP      http > 
60455 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=0 TSV=0 TSER=0
      5 0.066970    10.3.4.90             10.4.1.6              TCP      60455 
> http [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=355055022 TSER=0
      6 0.067000    10.3.4.90             10.4.1.6              HTTP     GET 
//gccheck.cfm HTTP/1.0 
      7 0.069538    10.4.1.6              10.3.4.90             TCP      [TCP 
segment of a reassembled PDU]
      8 0.069551    10.3.4.90             10.4.1.6              TCP      60455 
> http [ACK] Seq=31 Ack=124 Win=5888 Len=0 TSV=355055025 TSER=26878008
      9 0.069597    10.3.4.90             10.4.1.6              TCP      60455 
> http [FIN, ACK] Seq=31 Ack=124 Win=5888 Len=0 TSV=355055025 TSER=26878008
     10 0.069638    10.4.1.6              10.3.4.90             TCP      [TCP 
segment of a reassembled PDU]
     11 0.069653    10.3.4.90             10.4.1.6              TCP      60455 
> http [RST] Seq=31 Win=0 Len=0
     12 0.069656    10.4.1.6              10.3.4.90             TCP      [TCP 
segment of a reassembled PDU]
     13 0.069661    10.3.4.90             10.4.1.6              TCP      60455 
> http [RST] Seq=31 Win=0 Len=0
     14 0.069663    10.4.1.6              10.3.4.90             TCP      http > 
60455 [FIN, ACK] Seq=424 Ack=31 Win=65505 Len=0 TSV=26878008 TSER=355055022
     15 0.069668    10.3.4.90             10.4.1.6              TCP      60455 
> http [RST] Seq=31 Win=0 Len=0
     16 0.069887    10.4.1.6              10.3.4.90             TCP      http > 
60455 [ACK] Seq=425 Ack=32 Win=65505 Len=0 TSV=26878008 TSER=355055025
     17 0.069894    10.3.4.90             10.4.1.6              TCP      60455 
> http [RST] Seq=32 Win=0 Len=0

    132 2.070376    10.3.4.90             10.4.1.6              TCP      60469 
> http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=355057026 TSER=0 WS=6
    133 2.070837    10.4.1.6              10.3.4.90             TCP      http > 
60469 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=0 TSV=0 TSER=0
    134 2.070849    10.3.4.90             10.4.1.6              TCP      60469 
> http [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=355057026 TSER=0
    135 2.070866    10.3.4.90             10.4.1.6              HTTP     GET 
//gccheck.cfm HTTP/1.0 
    136 2.073582    10.4.1.6              10.3.4.90             TCP      [TCP 
segment of a reassembled PDU]
    137 2.073593    10.3.4.90             10.4.1.6              TCP      60469 
> http [ACK] Seq=31 Ack=124 Win=5888 Len=0 TSV=355057029 TSER=26878029
    138 2.073595    10.4.1.6              10.3.4.90             TCP      [TCP 
segment of a reassembled PDU]
    139 2.073600    10.3.4.90             10.4.1.6              TCP      60469 
> http [ACK] Seq=31 Ack=369 Win=6912 Len=0 TSV=355057029 TSER=26878029
    140 2.073603    10.4.1.6              10.3.4.90             TCP      [TCP 
segment of a reassembled PDU]
    141 2.073608    10.3.4.90             10.4.1.6              TCP      60469 
> http [ACK] Seq=31 Ack=424 Win=6912 Len=0 TSV=355057029 TSER=26878029
    142 2.073653    10.3.4.90             10.4.1.6              TCP      60469 
> http [FIN, ACK] Seq=31 Ack=424 Win=6912 Len=0 TSV=355057029 TSER=26878029
    143 2.073683    10.4.1.6              10.3.4.90             TCP      http > 
60469 [FIN, ACK] Seq=424 Ack=31 Win=65505 Len=0 TSV=26878029 TSER=355057026
    144 2.073692    10.3.4.90             10.4.1.6              TCP      60469 
> http [ACK] Seq=32 Ack=425 Win=6912 Len=0 TSV=355057029 TSER=26878029
    145 2.073932    10.4.1.6              10.3.4.90             TCP      http > 
60469 [ACK] Seq=425 Ack=32 Win=65505 Len=0 TSV=26878029 TSER=355057029
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 8
fcntl(8, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0
connect(8, {sa_family=AF_INET, sin_port=htons(80), 
sin_addr=inet_addr("10.4.1.6")}, 16) = -1 EINPROGRESS (Operation now in 
progress)
sendto(8, "GET /gccheck.cfm HTTP/1.0\r\n\r\n", 29, MSG_DONTWAIT|MSG_NOSIGNAL, 
NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(3, EPOLL_CTL_ADD, 8, {EPOLLOUT, {u32=8, u64=8}}) = 0
accept(7, 0x7fff72ffb770, [29422312683995264]) = -1 EAGAIN (Resource 
temporarily unavailable)
epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0
accept(6, 0x7fff72ffb660, [30493168289972352]) = -1 EAGAIN (Resource 
temporarily unavailable)
epoll_ctl(3, EPOLL_CTL_ADD, 6, {EPOLLIN, {u32=6, u64=6}}) = 0
accept(4, 0x7fff72ffb660, [30325492766736512]) = -1 EAGAIN (Resource 
temporarily unavailable)
epoll_ctl(3, EPOLL_CTL_ADD, 4, {EPOLLIN, {u32=4, u64=4}}) = 0
epoll_wait(3, {{EPOLLOUT, {u32=8, u64=8}}}, 9, 993) = 1
sendto(8, "GET /gccheck.cfm HTTP/1.0\r\n\r\n", 29, MSG_DONTWAIT|MSG_NOSIGNAL, 
NULL, 0) = 29
recvfrom(8, 0x69f800, 16384, 256, 0, 0) = -1 EAGAIN (Resource temporarily 
unavailable)
epoll_ctl(3, EPOLL_CTL_MOD, 8, {EPOLLIN, {u32=8, u64=8}}) = 0
epoll_wait(3, {{EPOLLIN, {u32=8, u64=8}}}, 9, 992) = 1
recvfrom(8, "HTTP/1.1 200 OK\r\nConnection: close\r\nDate: Wed, 10 Feb 2010 
13:10:04 GMT\r\nServer: Microsoft-IIS/6.0\r\nX-Powered-By: 
ASP.NET\r\nContent-type: text/html\r\nPage-Completion-Status: 
Normal\r\nPage-Completion-Status: Normal\r\nSet-Cookie: CFID=8654065; 
expires=Sun, 27-Sep-2037 00:00:00 GMT; path=/; \r\nSet-Cookie: 
CFTOKEN=85114548; expires=Sun, 27-Sep-2037 00:00:00 GMT; path=/; \r\n\r\n 
\r\n\r\n</head>\r\n\r\n<body lang=EN-US 
style='tab-interval:36.0pt'>\r\n\r\n<div class=Section1>\r\n\r\n<p 
class=MsoNormal><span class=GramE><span 
lang=EN-GB>..........</span></span><span\r\nlang=EN-GB> test page 
</span></p>\r\n\r\n<p class=MsoNormal><span 
lang=EN-GB><o:p>&nbsp;</o:p></span></p>\r\n\r\n<p class=MsoNormal><span 
class=SpellE><span class=GramE><span 
lang=EN-GB>gCokay</span></span></span><span\r\nlang=EN-GB> 
</span></p>\r\n\r\n<p class=MsoNormal><span 
lang=EN-GB><o:p>&nbsp;</o:p></span></p>\r\n\r\n<p class=MsoNormal><span 
lang=EN-GB>website 
okay</span></p>\r\n\r\n</div>\r\n\r\n\r\n</body>\r\n\r\n</html>\r\n", 16384, 
MSG_WAITALL, NULL, NULL) = 944
close(8)                                = 0


socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 8
fcntl(8, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0
connect(8, {sa_family=AF_INET, sin_port=htons(80), 
sin_addr=inet_addr("10.4.1.6")}, 16) = -1 EINPROGRESS (Operation now in 
progress)
sendto(8, "GET /gccheck.cfm HTTP/1.0\r\n\r\n", 29, MSG_DONTWAIT|MSG_NOSIGNAL, 
NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(3, EPOLL_CTL_ADD, 8, {EPOLLOUT, {u32=8, u64=8}}) = 0
epoll_wait(3, {{EPOLLOUT, {u32=8, u64=8}}}, 9, 988) = 1
sendto(8, "GET /gccheck.cfm HTTP/1.0\r\n\r\n", 29, MSG_DONTWAIT|MSG_NOSIGNAL, 
NULL, 0) = 29
recvfrom(8, 0x69f800, 16384, 256, 0, 0) = -1 EAGAIN (Resource temporarily 
unavailable)
epoll_ctl(3, EPOLL_CTL_MOD, 8, {EPOLLIN, {u32=8, u64=8}}) = 0
epoll_wait(3, {{EPOLLIN, {u32=8, u64=8}}}, 9, 988) = 1
recvfrom(8, "HTTP/1.1 200 OK\r\nConnection: close\r\nDate: Wed, 10 Feb 2010 
13:10:06 GMT\r\nServer: Microsoft-IIS/6.0\r\nX-Powered-By: 
ASP.NET\r\nContent-type: text/html\r\nPage-Completion-Status: 
Normal\r\nPage-Completion-Status: Normal\r\nSet-Cookie: CFID=8654066; 
expires=Sun, 27-Sep-2037 00:00:00 GMT; path=/; \r\nSet-Cookie: 
CFTOKEN=47946190; expires=Sun, 27-Sep-2037 00:00:00 GMT; path=/; \r\n\r\n", 
16384, MSG_WAITALL, NULL, NULL) = 368
close(8)                                = 0

Reply via email to