Turning off tcp_cork seems to make the problems under RH 6 go away; no more FIN_WAIT1 connections, netstat is completely clean. Turning tcp_cork off also works for me on the RH 7 machines, so I can happily build with distcc now on all my machines :) BTW, the FIN_WAIT1 connections never die off, they just keep accumulating as far as I can tell.
No weird firewalling in between the various machines (though there may be several routers in between.) Thanks much for the fixes. Hien ---- Original Message ---- From: Martin Pool Date: Wed 9/4/02 22:29 To: Hien D. Ngo Cc: [EMAIL PROTECTED] Subject: Re: More debugging info for FIN_WAIT1 bug with RH 6 On 4 Sep 2002, "Hien D. Ngo" <[EMAIL PROTECTED]> wrote: Content-Description: Mail message body > > I found the snippet of the distcc log for a FIN_WAIT1 connection. The child process > that is spawned exits so it tries to compile locally. Both the remote and local > compiles exit with the same error code. The file being compiled actually exits with > a real compile error in the log output (usually foo.cpp is trying to #include a > header that doesn't exist or some such error.) OK, that would explain why the client process goes away. Just dropping the socket halfway through is the expected behaviour. To get a small speed boost we start opening the socket before the preprocessor has completed, and in the relatively rare case where the preprocessor fails, we just drop the socket. This ought to cause the server to complain a little but cope. I am pretty sure that getting stuck in FIN_WAIT1 with no timer indicates a client kernel bug. Perhaps not using TCP_CORK would avoid it? You don't have any wierd firewalling or routing stuff between the machines, do you? Aside from that there is probably not much more that we can do. I guess the sockets and server processes will go away eventually. -- Martin _______________________________________________ distcc mailing list [EMAIL PROTECTED] http://lists.samba.org/cgi-bin/mailman/listinfo/distcc
