Hi Cyril,

On Tue, Sep 21, 2010 at 01:50:45AM +0200, Cyril Bonté wrote:
> Hi Willy and Jozsef,
> 
> Le lundi 20 septembre 2010 23:42:44, R.Nagy József a écrit :
> > > (...)
> > > Very nice, now we know that the FD does not get corrupted, but when
> > > haproxy wants to use it, it's already closed on the other side. Probably
> > > that a TCP rule causes a reject that closes the connection and that
> > > there is a possible return path that escapes from the controlll, leading
> > > to frontend_accept() trying to continue to use the closed FD.
> > > 
> > > I'll now try to find something in that spirit.
> > > 
> > > Also, it looks like only TCP is affected by this, because your
> > > unix-stream connection worked like a charm, and used the same FD (7).
> > > 
> > > I don't see anything suspect in the traces, but they clearly help
> > > eliminate wrong guesses.
> > > 
> > > Thanks Joe, I'll keep you informed if I find anything !
> 
> I don't know if it can help you but tonight I've installed a Freebsd in a VM 
> and could easily reproduce the issue.
> Let me know if you want me to test some patches so that Jozsef doesn't need 
> to 
> break his production traffic.
> 
> I'll try to find time next days to add some debugs to track the origin of the 
> issue.
> 
> As a last test (and quite late one in the night so I must stop here :-)), at 
> the "out_delete_cfd" label, if I replace "return -1" by "return 0", I don't 
> reproduce the issue.

That's interesting, because the "return -1" is here to take the error path,
and can only be caught once you get a -1 at least once on the setsockopt().
What I suspect is that sometimes we get a -1 here because the client has
reset the connection just after it was accepted. We then take the error
path and we have something there which incorrectly unrolls all that was done.
I've looked again and can't find what (fd_delete() is done, then all the
free and close). Well, fd_delete() already does a close(), so maybe we're
having an issue on freebsd with two consecutive close() on the same fd that
we don't have on another OS. I think we could move the fd_delete() to
session.c instead of frontend.c, since it's the one that does the fd_insert().

But anyway, I think that in your tests with your change, you should see the
message at least once, with the difference that it is not fatal.

Thanks!
Willy


Reply via email to