Hi Conrad,

On Sat, May 17, 2014 at 08:31:46PM +0200, Conrad Hoffmann wrote:
> Hi Willy,
> 
> On 05/17/2014 04:39 AM, Willy Tarreau wrote:
> > On 05/15/2014 11:26 AM, Conrad Hoffmann wrote:
> >> If haproxy has been running for a certain amount of time (this part
> >> makes reproduction cumbersome), we quite often run into the following
> >> issue when restarting haproxy: if some of the workers have active
> >> connections that they must wait for to be terminated, a fraction of
> >> those would start hogging CPU like crazy.
> > 
> > I had already observed that case myself and thought it was gone because
> > as you explain it's very hard to reproduce.
> 
> After sleeping a night over it, I came up with a sure-fire way to
> reproduce it. See below.
> 
> >> Attaching a debugger showed
> >> that epoll_wait() would immediately return with events that they did not
> >> process because they had already closed the file descriptor (and thus
> >> owner == 0x0).
> > 
> > Could you please confirm that it happens with the listening sockets ?
> 
> Yes. I ran lsof on all processes before sending USR1, then attached a
> debugger the one gone haywire and checked the fd in the epoll event. It
> was one of the original listening sockets.
> 
> Here is a nice way to trigger it:
>  - run haproxy in deamon mode, but with nbproc 1
>  - open a HTTP keep-alive connection (and keep it open)
>  - send USR1 to the only worker (it should stay running, because of the
>    ongoing connection)
>  - open a new connection -> the worker goes to 100%
> 
> Works all the time for me. Running lsof before and after the signal
> neatly shows how the worker closed the listening socket but the parent
> still has it.

I'm just confused with what the "parent" here can be since you have nbproc 1,
so there's a single process overall. Or maybe you're using the debugger to
breakpoint the parent process and prevent it from leaving ?

I was thinking that starting with nbproc 2 with long connections on both
would cause the issue to happen as well.

(...)
> > Maybe we could simply have a flag in the fdtab indicating that the fd is
> > shared (basically all FDs opened before the fork would have it), and decide
> > in poller.clo() to only call epoll_ctl(DEL) when this flag is set, then 
> > clear
> > it. It would seam reasonable in my opinion. Do you think that you could 
> > extend
> > your current patch to do that ?
> 
> Good point on the performance issue. I whipped up a new patch, I hope
> this is what you meant? I surely haven't yet fully understood all
> architecture and design decisions, so feel free to nitpick :)

Yes, that's about the principle, though I thought about doing it slightly
differently : in my opinion the problem is not caused by the FD being a
listener, but by it being cloned across the fork(), eventhough for now
there's a 1:1 relation between the two, that could change in the future
(eg: a shared inter-process socket for stats or whatever).

I would have proceeded slightly differently :

  - in fd_insert(), I'd have added :

        fdtab[fd].cloned = 0;

  - in ev_epoll.c:_do_fork(), I'd have marked all the opened FDs as cloned :

        for (i = 0; i <= maxfd; i++)
           if (fdtab[fd].owner)
              fdtab[fd].cloned = 1;

That way, we'd be sure that all open FDs are properly tagged, whatever they
are used for.

> Also, I wasn't sure if this should go into the UNIX sockets, too? Now
> that I am writing this, I actually suppose so. Let me know if I should
> add it there, too!

One more reason for doing the job in the poller, otherwise when adding
new protocols, it's easy to forget about this!

Best regards,
Willy


Reply via email to