Hey,

I will do more elaborate test runs in the next couple of days. I will create traces with ktrace which is not as nice as strace but at least will provide more context. Is there anything in particular you'd be interested in like only syscalls?

Meanwhile I have build haproxy with debug symbols but in the tests I ran today, haproxy did not coredump but only went for the 100% CPU way of failing where I had to kill it manually. This happened with httpclose and with keep-alive so I'd say the problem is not really related to that.

Its so sad because before the CPU load suddenly risees, and requests/connections aren't handled anymore haproxy performs so well and effortless.

Also, if I can help by providing access to a FreeBSD machine, just let me know. I have plenty :)

If you have any other idea apart from ktrace, coredumps to make troubleshooting more effective I'd be more than happy to help.

Kind regards, John

Willy Tarreau wrote:
Hi,

On Tue, May 06, 2014 at 12:02:59PM +0200, Lukas Tribus wrote:
Hi,

I'm currently attempting to replace our commercial Loadbalancer with SSL
termination with haproxy. I'm running it on FreeBSD 9.2 Stable.

We have thousands of requests per second and for a while everything runs
extremely smooth. No queues are running full, machine load is at 0.5,
haproxy is at 2-3% CPU per process.

Then after an undetermined amount of time one process after another
(we're running with nbproc) is locked up with 100% CPU and does not
recover. Eventually all listen queues are filled to the max and I have
to kill and restart haproxy.
Very bad. You don't have strace on freebsd x64, correct? Can you downgrade
to dev22 in absence of a better advise?

I keep some memories of another similar bad result on FreeBSD a while ago
that we didn't manage to troubleshoot, in part due to the extremely low
number of users, and in part due to the absence of strace which left us
even more blind.

All I know so far is that it happens more frequently in peak traffic
times. Also I have tried to use the option httpclose because I thought
it was related to keep-alive issues. When I have tried this, instead of
locking up at 100%, haproxy suddenly exits with a core dump.
Can you provide backtraces or executable + coredump? I suggest you do
that privately to Willy, as it will contain your private data (like
certificates, etc).

Could you also try to disable kqueue (start with -dk or use "nokqueue" in
the global section). That's one difference between FreeBSD and other OSes,
and at least it will tell us if the bug is outside of it or not. Please
bear in mind that you'll be running with poll() which scales much less
well with large connection counts.

Also, do you observe the problem only in multi-process mode or also with
a single process ?

Regards,
Willy


--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser

Reply via email to