Hi guys,

On Fri, Feb 13, 2009 at 08:04:50AM -0500, John Lauro wrote:
> It wouldn't hurt to put RHEL 5 or Centos 5 on the box instead of FC.   FC is
> generally meant for desktops instead of servers.

A customer has encountered a similar issue a few times on RHEL3. We
noticed there was swap on the affected machines. It would happen after
about 6 months of production. Haproxy would not receive any request
for some long periods (several seconds) and we noticed this happened
most frequently during network backups.

We had a few occurrences of the issue in the middle of the day while
the admins were grepping errors in the logs. There was a lot of CPU
usage, so at first we suspected scheduling issues. But when we noticed
the swap usage, we figured that some of the process' structures might
have been swapped, causing long delays when accessing data. Interestingly,
restarting the process was enough to make the issue go away, since the
memory usage was quite lower after a restart.

The reason for the swap was not a lack of RAM but a high usage of the
disk cache pushing rarely used data into the swap.

And I agree with you John, a "swapoff -a" must absolutely be done.
There's not even one valid reason to enable swap on a network server,
all it can do is delay all operations and kill performance.

> Your default ulimit -n is only 1024.  Just make sure you raise that to match
> or exceed your Haproxy configuration prior to starting Haproxy.  Even if
> that is a problem, it wouldn't explain why you have a problem when looking
> at the logs.

It is not a problem if haproxy is started as root, as it adjusts the
ulimit-n itself. And you're right, it would not cause side effects
while looking at the logs.

> The grep on /var/messages completed too quick to really catch much.  That
> said, your SYS time is a little high, especially after it finished.  For an
> 8 core box, only 12.5% would mean one core dedicated to the task, and it
> rose from 4 to 16.   Given that it was counted as sys and not user, and
> generated little I/O, indicates it might be slow memory processing on the
> cache.

Other I/O intensive workloads such as "wc -l /var/log/*" might help seeing
if the swap usage suddenly grows.

Another test which might be done when the problem becomes reproducible, is
to flush the caches and swapoff everything :

    # echo 1 >/proc/sys/vm/drop_caches
    # swapoff -a

Then redo the operation. If the problem does not happen anymore, it clearly
indicates a poor tradeoff between swap and cache.

Regards,
Willy


Reply via email to