Hi Dominik,

On Sun, Apr 01, 2012 at 01:43:31PM +0200, Mostowiec Dominik wrote:
> Hi,
> 
> >>     maxconn 163937
> > What's the reason for this magic number ?
> It's random :-)

OK

> > Did you notice that your request packet (the 4th) was lost on the network ?
> > I guess you captured on the siege_host
> 
> I captured this on loadbalancer host :-( It's not network loses.

So please check network stats using "netstat -s", you're having something
causing incoming packets to be dropped, and that really does not make sense
at all.

> > you did not have -vv nor -S so some info are missing
> I recorded this to a file, with -vv:

thank you, it's better now.

> 
> 11:20:58.713922 IP (tos 0x0, ttl 64, id 7370, offset 0, flags [DF], proto TCP 
> (6), length 48)
>     siege_host.46589 > loadbalancer.8123: Flags [S], cksum 0xe536 (correct), 
> seq 1849604553, win 14600, options [mss 1460,nop,wscale 4], length 0
> 11:20:58.713951 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP 
> (6), length 48)
>     loadbalancer.8123 > siege_host.46589: Flags [S.], cksum 0x683e (incorrect 
> -> 0x7e18), seq 121266129, ack 1849604554, win 14600, options [mss 
> 1460,nop,wscale 6], length 0
> 11:20:58.714687 IP (tos 0x0, ttl 64, id 7371, offset 0, flags [DF], proto TCP 
> (6), length 40)
>     siege_host.46589 > loadbalancer.8123: Flags [.], cksum 0xdf59 (correct), 
> seq 1, ack 1, win 913, length 0
> 11:20:58.714894 IP (tos 0x0, ttl 64, id 7372, offset 0, flags [DF], proto TCP 
> (6), length 190)
>     siege_host.46589 > loadbalancer.8123: Flags [P.], cksum 0x11eb (correct), 
> seq 1:151, ack 1, win 913, length 150

Checksum is correct, TTL is not null, everything is fine, still it's being
dropped.

> 11:21:00.717226 IP (tos 0x0, ttl 64, id 7373, offset 0, flags [DF], proto TCP 
> (6), length 40)
>     siege_host.46589 > loadbalancer.8123: Flags [F.], cksum 0xdec2 (correct), 
> seq 151, ack 1, win 913, length 0
> 11:21:00.717254 IP (tos 0x0, ttl 64, id 17608, offset 0, flags [DF], proto 
> TCP (6), length 40)
>     loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6836 (incorrect 
> -> 0xe205), seq 1, ack 1, win 229, length 0
> 11:21:01.723109 IP (tos 0x0, ttl 64, id 7374, offset 0, flags [DF], proto TCP 
> (6), length 190)
>     siege_host.46589 > loadbalancer.8123: Flags [P.], cksum 0x11eb (correct), 
> seq 1:151, ack 1, win 913, length 150

And the retransmitted one is exactly the same even with the same checksum but
it's accepted this time.

Among the things that come to mind :
  - to you have netfilter loaded ? It must pass this without any issue but
    we have to find what causes a packet to be lost.
  - check your network sysctls (sysctl -a |fgrep net.), maybe some buffer is
    too small ?
  - check kernel logs (dmesg) to see if you notice anything suspicious.

The packet *should* be acked and *should* be delivered to haproxy. There is
no reason it is dropped like this, because the TCP stack did not even notice
it (otherwise it would have been ACKed).

Last, what's your kernel version ? It would surprize me a lot that we'd be
facing so big a bug in the network stack, but we have to consider every
possibility.

(...)
> Request are retransmitted:

Yes that's what is observed in your trace, the request is what is in the
PUSH packet which is not ACKed. The fact that it is not ACKed indicates
that the packet was not seen by the TCP stack, which is abnormal since it
reached tcpdump at least. Too small network buffers could explain this
but at such low numbers I'm really doubting.

(...)
> > Wow 16 procs ! I don't know what you intend to do, but it will generally
> > not bring anything and might even reduce the performance.
> 
> I have 2x6 core server (24 core in ht).

That doesn't change anything. Workloads consisting in fast connection
setup/teardown do not scale well on multiple cores because there is a
substantial amount of locking in the TCP stack to select a source port,
update counters, etc... And what we're doing previsely is to make this
part work a lot (under load only, here the load was low to moderate).

Multiple cores can help when doing complex processing (ssl, compression)
but not for short sessions.

Willy


Reply via email to