Hi Wolfgang, On Tue, Apr 03, 2012 at 05:20:12PM +0200, Wolfgang Engel wrote: > Hi Willy, > > that sounds interesting because we are using Cisco firewall as well. So > that issue might be related to that. > Our current situation is that we switched back to apache2 with > mod_balancer since we currently haven't enough time to investigate since > a datacenter move is going on and we have to keep things stable until then. > Since we switched back, our users didn't experience upload/download > problems anymore. Not sure why it is > working now. I will have more time to investigate after our datacenter > move, which will be around June since we > are planning to switch back to haproxy. So if I can do further > investigation on that in June please let me know if I can provide you > with more data, or at least I can do more tests regarding our firewall > to make sure that we don't have an issue there. > > Please find the dump of a failing download at > ftp://ftp.suse.com/pub/people/wengel/haproxy/haproxy-download.dump
In this trace, I'm noticing that you have some packet losses, for instance, 9.5 kB were lost below between 90139223 and 90148799 : 17:18:57.020221 IP (tos 0x0, ttl 54, id 31322, offset 0, flags [DF], proto TCP (6), length 2788) 130.57.19.200.80 > 10.10.11.45.41965: ., cksum 0xb60e (incorrect (-> 0x51db), 90136487:90139223(2736) ack 4167216578 win 5 17:18:57.020247 IP (tos 0x0, ttl 64, id 60625, offset 0, flags [DF], proto TCP (6), length 52) 10.10.11.45.41965 > 130.57.19.200.80: ., cksum 0xab5e (incorrect (-> 0x9ef2), 4167216578:4167216578(0) ack 90139223 win 4827 17:18:57.020469 IP (tos 0x0, ttl 54, id 31331, offset 0, flags [DF], proto TCP (6), length 1420) 130.57.19.200.80 > 10.10.11.45.41965: ., cksum 0x6896 (correct), 90148799:90150167(1368) ack 4167216578 win 54 <nop,nop,ti 17:18:57.020486 IP (tos 0x0, ttl 64, id 60626, offset 0, flags [DF], proto TCP (6), length 52) 10.10.11.45.41965 > 130.57.19.200.80: ., cksum 0xab5e (incorrect (-> 0x9ef2), 4167216578:4167216578(0) ack 90139223 win 4827 Your server has TCP SACK disabled so it has to retransmit a lot of things since it does not know what the client got without wasting a few RTTs, which is why you observe a lot of retransmit. You should really (*really*) enable TCP sack (net.ipv4.tcp_sack=1) to avoid this and to reduce recovery time on losses. It is possible that the machine running on apache has SACK enabled, reducing in a significant drop of the number of retransmits. I'm also a bit worried about the last RST segment just after some data. Would you happen to have "option nolinger" in your configuration ? It could precisely cause this and result in truncated responses. Regards, Willy

