Hi Wolfgang,

On Tue, Apr 03, 2012 at 05:20:12PM +0200, Wolfgang Engel wrote:
> Hi Willy,
> 
> that sounds interesting because we are using Cisco firewall as well. So 
> that issue might be related to that.
> Our current situation is that we switched back to apache2 with 
> mod_balancer since we currently haven't enough time to investigate since 
> a datacenter move is going on and we have to keep things stable until then.
> Since we switched back, our users didn't experience upload/download 
> problems anymore. Not sure why it is
> working now. I will have more time to investigate after our datacenter 
> move, which will be around June since we
> are planning to switch back to haproxy. So if I can do further 
> investigation on that in June please let me know if I can provide you 
> with more data, or at least I can do more tests regarding our firewall 
> to make sure that we don't have an issue there.
> 
> Please find the dump of a failing download at 
> ftp://ftp.suse.com/pub/people/wengel/haproxy/haproxy-download.dump

In this trace, I'm noticing that you have some packet losses, for instance,
9.5 kB were lost below between 90139223 and 90148799 :

17:18:57.020221 IP (tos 0x0, ttl 54, id 31322, offset 0, flags [DF], proto TCP 
(6), length 2788) 130.57.19.200.80 > 10.10.11.45.41965: ., cksum 0xb60e 
(incorrect (-> 0x51db), 90136487:90139223(2736) ack 4167216578 win 5
17:18:57.020247 IP (tos 0x0, ttl 64, id 60625, offset 0, flags [DF], proto TCP 
(6), length 52) 10.10.11.45.41965 > 130.57.19.200.80: ., cksum 0xab5e 
(incorrect (-> 0x9ef2), 4167216578:4167216578(0) ack 90139223 win 4827
17:18:57.020469 IP (tos 0x0, ttl 54, id 31331, offset 0, flags [DF], proto TCP 
(6), length 1420) 130.57.19.200.80 > 10.10.11.45.41965: ., cksum 0x6896 
(correct), 90148799:90150167(1368) ack 4167216578 win 54 <nop,nop,ti
17:18:57.020486 IP (tos 0x0, ttl 64, id 60626, offset 0, flags [DF], proto TCP 
(6), length 52) 10.10.11.45.41965 > 130.57.19.200.80: ., cksum 0xab5e 
(incorrect (-> 0x9ef2), 4167216578:4167216578(0) ack 90139223 win 4827

Your server has TCP SACK disabled so it has to retransmit a lot of things
since it does not know what the client got without wasting a few RTTs, which
is why you observe a lot of retransmit. You should really (*really*) enable
TCP sack (net.ipv4.tcp_sack=1) to avoid this and to reduce recovery time on
losses.

It is possible that the machine running on apache has SACK enabled, reducing
in a significant drop of the number of retransmits.

I'm also a bit worried about the last RST segment just after some data. Would
you happen to have "option nolinger" in your configuration ? It could precisely
cause this and result in truncated responses.

Regards,
Willy


Reply via email to