Re: CPU 100% when waiting for the client timeout

Willy Tarreau Fri, 20 Nov 2015 03:22:45 -0800

Hi BaiYang,

On Fri, Nov 20, 2015 at 06:59:12PM +0800, baiyang wrote:
> Hi Willy, 
> 
> > This one seems to have missed 3 years of bugfixes
> I've just done a "apt-get update && apt-get upgrade" successfully and reboot
> the machine this week. I think the OS is fresh enough, but I'll try to
> upgrade the kernal to a newer one. :-)


But the kernel's build date dates 2012, that's what troubles me. Are you
sure you're not running on a locally built kernel that is never updated
anymore or any such thing ?

> > I find this very strange here because the request time is high, indicating 
> > a reused keep-alive connection.
> Yes, we enabled http keep-alive with a 5 minutes idle timeout. You can get my 
> full config file in my previous mail.

OK.

> Also, we are using the long-polling mechanism to doing the message push
> (comet) task. The long-polling request will be timed-out (return a empty http
> response) within 285 seconds if there is no message to push to the client. Of
> course it could be returned earlier if there are something to notify the
> client.

OK.

> The HTTP long-polling API is "/FFC/App/lpWaitMessage", it should only
> appeared in the "https-in" front-end. If you saw a large request time, as you
> said, it maybe caused by a connection which previously doing a long-polling
> request has been reused.

Ah very good point indeed.

> > When the problem happens, could you please try to dump the output of "show 
> > sess all" sent on the CLI ?
> > So that makes me think that the output of one second of strace when the 
> > trouble appears will also help 
> It's very difficault, because I could not predict when it will happen
> accurately. Anyway, I will give it a try.

I understand, don't worry. I'm not asking you to try the impossible,
but since you seem to have the only environment which manages to
trigger this issue, your help here is appreciated.

> > I remember having seen a kernel bug affecting epoll a long time ago where a 
> > deleted even was still reported...
> Amm... I think it not the kernal's issue here. Because:
> 1. As I said, I've just done a update of it.

That's the point that makes me doubt due to the build date. I don't
deny that you update your distro, I mean that I don't know if the
kernel package managed to slip through the cracks, or if the boot
loader picks the new kernel instead of another one that used to be
installed 3 years ago and which is the default boot.

> 2. Our backend servers are all using the same Ubuntu 12.04 LTS Server image,
> and all of them are using epoll + non-blocking IO + thread pool on linux.
> It's working fine for many years, we have never seen any trouble on it.

Just like almost nobody else faces this bug except the guy who
reported it.

> 3. Even if the epoll_wait have returned a invalid file descriptor, we still
> get into a cpu exhausion here.

That's the case I was explaining which I saw with my own eyes. The
principle was very simple. You subscribe an fd to epoll using
epoll_ctl(ADD) and you poll for it. You need to know that internally
the kernel doesn't use the FD so what it does is to add an event to
the file pointed to by the fd, and assigns the fd as the associated
value to the event. Later you decide to close the fd. It's supposed
to be removed from the fd list because you closed the last user, but
on some older bogus kernels the event was not always removed,
resulting in epoll_wait() reporting events for that fd. And the
problem here is that since the fd is already closed, whatever
operations you try to perform on it return EBADF, including
epoll_ctl(DEL). So you have a busy polling. The process still
works well, but you can't prevent epoll_wait() from permanently
reporting activity on this fd, and you can't change this fd's
status.

> As far as I can remembered:
> 3.1. epoll can works on two mode: "edge" triggered and "level" triggered.
> If we are in level triggered mode, we must set the EPOLLONESHOT flag to
> avoid duplicated events.

Absolutely not. You *may* use it on systems that support it. And doing so
comes with a cost as you have to resubscribe it everytime you drain that
event. EPOLLONESHOT is in fact a softer alternative to EPOLLET that is
easier to implement in software that mostly need EPOLLET but fear the
loss of certain events when the code is not perfectly structured. Here
we do the opposite and work in the "old way". Don't forget that we're
a proxy and passing data from one side to another. So in practice you
receive a read event, read the data, send the data on the other side
and wait again. Since the loop always sends you back to the same state
as before the event, you certainly don't want EPOLLONESHOT as you want
to remain subscribed. And we pay the price of epoll_ctl() only when the
situation changes.

> Otherwise we need to set EPOLLET flag to turn it to edge triggered mode.
> 3.2. The "events" field of the epoll event should be carefully checked before
> using it.
> 3.3. Using epoll on a full-duplex connection need some special care. Or some
> bad things may occured in some special event sequence (e.g.: read after write
> fault, write after a read fault, etc.).
> 3.4. Maybe we have a loop to manipulate a invalid file descriptor incorrectly
> (e.g.: Are we correctly processes all possiable return values and error codes
> of some system call like read, write, close, etc.)?

Don't worry, we know all this quite well. I'm not denying that there may
be a bug in haproxy, it's very possible, I'm just saying that you *appear*
to have a kernel which contains bugs which *could* cause this, and that
nobody else reports such issues. So the probability on this side seems
high and needs further investigation. That doesn't prevent us from searching
but in parallel I want to be certain that the kernel doesn't have these
bugs and for now everything concurs to prove they are still present.

> > I find this quite strange but given that we had no such report in more than 
> > one year of 1.5 deployed everywhere
> I remembered it firstly appeared after I upgraded haproxy to the 1.6.2, at 
> that time, I seems added some options to the config file:
>     timeout client-fin      10s  # ???????????????????????? TCP 4 
> ???????????????
>     timeout server-fin      5s   # ????????? Server ???????????? TCP 4 
> ???????????????
>     http-reuse always            # 
> ?????????????????????????????????????????????????????????????????????1.6+???
> The "http-reuse" option could be omitted because it's 1.6+ only and I have 
> been disabled it when downgrade to 1.5.15 and 1.5.14.
> I have been upgrade back to 1.6.2 with all these three options disabled now,
> waiting to see what will happen. If it appeared again, I'll upgrade the linux
> kernel to the newest one of Ubuntu 12.04.

Just out of curiosity, do you remember if you rebooted after upgrading
haproxy to 1.6 the first time ? It might be that your previous kernel
was up to date and that consecutive to the reboot you've been using an
old one, triggering the bug, and which would explain why even when
reverting to 1.5 you're still seeing it.

Cheers,
Willy

Re: CPU 100% when waiting for the client timeout

Reply via email to