Hi,

On Thu, Jul 16, 2009 at 03:52:16AM -0700, Hank A. Paulson wrote:
> I have a machine with 1 GB RAM and a Core Duo 2 processor running only 
> haproxy (and necessary system processes) including rsyslog sending haproxy 
> logs to other machines. No iptables active/loaded.
> 
> Every minute the "packets collapsed in receive queue due to low socket 
> buffer" jump by several hundred/thousand, I have tried some sysctl tuning, 
> but am at a loss. It is a bit disillusioning to have things bumping into a 
> limit at this low of traffic level.
> 
> The system is running about 25 Mbps continuously. Any help is appreciated.

I have never seen this one go that high. There is another one which is very
high in your case, probably for the same reason :

>     2364480 packets pruned from receive queue because of socket buffer overrun
>     154779642 packets collapsed in receive queue due to low socket buffer

I think that one of the reasons might be that your TCP mem parameters are
not correctly tuned :

> net.ipv4.tcp_rmem = 8192 99380 16777216
> net.ipv4.tcp_wmem = 4096 75536 16777216
> net.ipv4.tcp_mem = 179000 999000 1972000

You see tcp_mem ? The units are pages for this one, not bytes. It says
that by default, you assign 4 GB of RAM for the TCP stack, and that it
can go as high as 8 GB if required, but can be as low as 800 MB in case
of memory shortage.

The other ones indicate that you have 100 and 75 kB of RAM per socket
by default, respectively, but that if memory permits, you can reach as
high as 16 MB per socket. So I think that your TCP windows are quickly
announced as very large because the system *thinks* it can, and when
a transfer occurs, memory quickly becomes short, and the system
discovers that it has to shrink buffers a lot, causing data loss
on the receive path.

You should leave these ones to default values in my opinion, they are
not bad at all. BTW, it is recommended that the default tcp_wmem is a
multiple of the page size (4kB) and that the default tcp_rmem is a
multiple of your most common MSS (1460 on ethernet). Probably you
wanted to put 99280 (68*1460) here and not 99380, which does not seem
a multiple of any common value.

Also, be careful, your net.ipv4.tcp_fin_timeout is very low, though
this is unrelated to your current problem. Try to always stay above
25s if you want to avoid issues with socket not getting correctly
closed on remote systems.

Regards,
Willy


Reply via email to