Hi Conrad,
On Sat, May 17, 2014 at 08:31:46PM +0200, Conrad Hoffmann wrote:
> Hi Willy,
>
> On 05/17/2014 04:39 AM, Willy Tarreau wrote:
> > On 05/15/2014 11:26 AM, Conrad Hoffmann wrote:
> >> If haproxy has been running for a certain amount of time (this part
> >> makes reproduction cumbersome), we quite often run into the following
> >> issue when restarting haproxy: if some of the workers have active
> >> connections that they must wait for to be terminated, a fraction of
> >> those would start hogging CPU like crazy.
> >
> > I had already observed that case myself and thought it was gone because
> > as you explain it's very hard to reproduce.
>
> After sleeping a night over it, I came up with a sure-fire way to
> reproduce it. See below.
>
> >> Attaching a debugger showed
> >> that epoll_wait() would immediately return with events that they did not
> >> process because they had already closed the file descriptor (and thus
> >> owner == 0x0).
> >
> > Could you please confirm that it happens with the listening sockets ?
>
> Yes. I ran lsof on all processes before sending USR1, then attached a
> debugger the one gone haywire and checked the fd in the epoll event. It
> was one of the original listening sockets.
>
> Here is a nice way to trigger it:
> - run haproxy in deamon mode, but with nbproc 1
> - open a HTTP keep-alive connection (and keep it open)
> - send USR1 to the only worker (it should stay running, because of the
> ongoing connection)
> - open a new connection -> the worker goes to 100%
>
> Works all the time for me. Running lsof before and after the signal
> neatly shows how the worker closed the listening socket but the parent
> still has it.
I'm just confused with what the "parent" here can be since you have nbproc 1,
so there's a single process overall. Or maybe you're using the debugger to
breakpoint the parent process and prevent it from leaving ?
I was thinking that starting with nbproc 2 with long connections on both
would cause the issue to happen as well.
(...)
> > Maybe we could simply have a flag in the fdtab indicating that the fd is
> > shared (basically all FDs opened before the fork would have it), and decide
> > in poller.clo() to only call epoll_ctl(DEL) when this flag is set, then
> > clear
> > it. It would seam reasonable in my opinion. Do you think that you could
> > extend
> > your current patch to do that ?
>
> Good point on the performance issue. I whipped up a new patch, I hope
> this is what you meant? I surely haven't yet fully understood all
> architecture and design decisions, so feel free to nitpick :)
Yes, that's about the principle, though I thought about doing it slightly
differently : in my opinion the problem is not caused by the FD being a
listener, but by it being cloned across the fork(), eventhough for now
there's a 1:1 relation between the two, that could change in the future
(eg: a shared inter-process socket for stats or whatever).
I would have proceeded slightly differently :
- in fd_insert(), I'd have added :
fdtab[fd].cloned = 0;
- in ev_epoll.c:_do_fork(), I'd have marked all the opened FDs as cloned :
for (i = 0; i <= maxfd; i++)
if (fdtab[fd].owner)
fdtab[fd].cloned = 1;
That way, we'd be sure that all open FDs are properly tagged, whatever they
are used for.
> Also, I wasn't sure if this should go into the UNIX sockets, too? Now
> that I am writing this, I actually suppose so. Let me know if I should
> add it there, too!
One more reason for doing the job in the poller, otherwise when adding
new protocols, it's easy to forget about this!
Best regards,
Willy