On Wed, Mar 31, 2010 at 01:06:05AM -0400, Geoffrey Mina wrote: > Willy, > Thanks for the response. I have attached a pcap file here off the webserver > we are trying to load balance to. It is full of errors... but honestly, I > don't know enough about this level of TCP to know what the problem is.
Those are not real errors. It is just because the outgoing packets' TCP checksum is computed by the network card, so it is not yet correct in the network stack where tcpdump gets the packets. You can safely ignore that. > Again, the attached file is a tcpdump of port 80 on the webserver. Your trace shows that the server announces that it is going to send 1213090 bytes : HTTP/1.1 206 Partial Content Content-Length: 1213090 Content-Type: application/x-shockwave-flash Content-Location: http://173.203.208.131/adminui/IntelliQueueAdmin.swf Content-Range: bytes 412748-1625837/1625838 Unfortunately, it stops sending anything after 263938 bytes, after what haproxy's timeout finally strikes. Looking more in details, we see that the server sends one segment which never manages to reach the haproxy server : 07:02:21.547517 IP (tos 0x0, ttl 128, id 7675, offset 0, flags [DF], proto TCP (6), length 1500, bad cksum 0 (->ca2c)!) 173.203.224.217.80 > 173.203.208.131.45278: Flags [.], cksum 0x0cfb (incorrect -> 0xcb49), seq 4060117381:4060118829, ack 1003968144, win 65038, options [nop,nop,TS val 21159278 ecr 11067241], length 1448 This one is repeated multiple times and the only ACK which are sent to the server are for the previous segment. This indicates that the connection is still alive but that this specific packet cannot pass through as it is never received by the other side. According to your TTL, you seem to have one component between haproxy and the web server. Is it a firewall ? Maybe it's a bit buggy (that once was very common, it's less common these days). It would be nice to attempt the capture on it again to see if the same packet passes through it or not. Maybe there is some form of IDS or pattern matching which believes it has found an attack or invalid content in that packet (pattern matching is the worst thing to do, it will always do such nasty things). Also, when the packet was retransmitted, the server also tried to reduce its size to 576 bytes, but this did not work either. That means that if something is wrong, it's in the first 576 bytes of the packet. Regards, Willy

