Hi there,
> when the problem happens, does haproxy still serve traffic but at 100% CPU or
> does it spin on itself and stop processing traffic ?
No, the proxy service is not hang even in the spike time. you can verify it by
check "haproxy.log" in my first mail. There are a lot of reqeusts have been
processed normally even under the busy time.
> could it be that sometimes your processes are reloaded ?
I don't very clear what's your means, If you are suspecting the haproxy
process: I'm pretty sure there is no any reload when the symptoms appear.
Again, you can check the pidstat.log and haproxy.log to verify it.
If you mean the backend server: We have been checked the two servers involved
in this case. And:
1.The server will emit if it has been reloaded or restarted, but there are no
such records even in a whole day.
2.We have been located the problematic application level session:
[2015-11-19 08:20:52.313756 UTC+0800][Info] (audit.fend) user logged on from
58.246.225.90
uid = xxx
number = xxx
user agent = Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/31.0.1650.63 Safari/537.36
@{tid = xxxxxx}
@{nid = 0}
And there is pretty normal just before and after this log record.
3.We have been reviewed our distributed coordinate service (something like
zookeeper and etcd), and there is no log to show the serivce has any down time
today.
PS: I remembered the new version of haproxy has some strange statistics maybe
related to this issue:
http://stackoverflow.com/questions/33168469/whats-the-exact-meaning-of-session-in-haproxy/33540826?noredirect=1#comment55014864_33540826
Thanks :-)
--
Best Regards
BaiYang
[email protected]
http://baiy.cn
**** < END OF EMAIL > ****
From: Willy Tarreau
Date: 2015-11-20 02:21
To: baiyang
CC: Lukas Tribus; haproxy
Subject: Re: RE: CPU 100% when waiting for the client timeout
Hello,
On Fri, Nov 20, 2015 at 12:09:03AM +0800, baiyang wrote:
> > Which release did you use previously that worked fine?
> I'm using 1.5.14 before upgraded to 1.6.2. I believe everythins is ok before
> the last upgrade. I download it from launchpad.net.
That's quite problematic. I've just reviewed the changelog between 1.5.14 and
1.5.15 and am not seeing anything there which could explain this. Just to be
clear, when the problem happens, does haproxy still serve traffic but at 100%
CPU or does it spin on itself and stop processing traffic ?
I would guess the former, which could be a but in a timeout handling somewhere.
Another point while I'm thinking about it, could it be that sometimes your
processes are reloaded ? I mean, if one of the last request takes 15mn to
complete and you perform a soft reload, the process will not exit until
this request is completed. And by definition, when it completes, it's the
last one in the log before the process tries to leave. In this case you'll
see another process in parallel still doing the job (the new one).
Willy