I found the snippet of the distcc log for a FIN_WAIT1 connection. The child process that is spawned exits so it tries to compile locally. Both the remote and local compiles exit with the same error code. The file being compiled actually exits with a real compile error in the log output (usually foo.cpp is trying to #include a header that doesn't exist or some such error.)
Hien ---- Original Message ---- From: Martin Pool Date: Wed 9/4/02 20:35 To: Hien D. Ngo Cc: [EMAIL PROTECTED] Subject: Re: FIN_WAIT1 bug with RH 6 (Re: [distcc] distcc 0.9 released) On 4 Sep 2002, "Hien D. Ngo" <[EMAIL PROTECTED]> wrote: Content-Description: Mail message body > > distcc continues to run on my RH 6 test boxes, but now leaves a ton > of FIN_WAIT1 processes around (284 total at last count.) My RH > 7.2/7.3 boxes don't exhibit this problem and are still running > without problems as of this writing. I'm happy to hear about the 7.x machines working. > ======= > distcc > ======= > [EMAIL PROTECTED] $ netstat -to | grep 3568 > tcp 0 69 build03.foo.com:3568 build04.foo.com:4200 FIN_WAIT1 off (0.00/0/0) > [EMAIL PROTECTED] $ lsof -i:3568 (Let me step through it to be clear in my own mind.) This is a client; it has a socket open to the server, and it has closed the local end and is waiting for a FIN from the server. Also, there are 69 bytes still buffered, waiting to be either ACKd by the server, or retransmitted. I am a little surprised that there is no timer running, because the client ought to be retransmitting the queued data in an attempt to get the server to ACK the last 69 bytes. According to lsof, no program has the socket open, which would explain why it's closed. According to your log from the server, the server is waiting to receive the compiler arguments, so the client should not normally have exited at that point. So I wonder if the client either crashed, or exited abnormally? It would be interesting to either look for client-side core files (making sure they're enabled), or look at the verbose client log to see why the client went away, or failing that what it managed to do before it left. > ======= > distccd > ======= > [EMAIL PROTECTED] $ netstat -to | grep 3568 > tcp 0 0 build04.foo.com:4200 build03.foo.com:3568 > ESTABLISHED off (0.00/0/0) It looks like everything is fine on the server side; it's trying to read more data. And isn't getting any. So overall I am inclined to suspect that there is a kernel bug relating to FIN_WAIT1 on RH6.2, and also that something yet to be determined is causing distcc to quit early. -- Martin _______________________________________________ distcc mailing list [EMAIL PROTECTED] http://lists.samba.org/cgi-bin/mailman/listinfo/distcc
======= distcc ======= [EMAIL PROTECTED] $ netstat -to | grep 4883 tcp 0 44 build03.foo.com:4883 build05.foo.com:4200 FIN_WAIT1 off (0.00/0/0) [EMAIL PROTECTED] $ lsof -i:4883 ======= distccd ======= [EMAIL PROTECTED] $ netstat -to | grep 4883 tcp 0 0 build05.foo.com:4200 build03.foo.com:4883 ESTABLISHED off (0.00/0/0) [EMAIL PROTECTED] $ lsof -i:4883 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME distccd 26417 ngoh 5u inet 1549188 TCP build05.foo.com:4200->build03.foo.com:4883 (ESTABLISHED) [EMAIL PROTECTED] $ strace -p26417 about to attach 6731 read(5, <unfinished ...> [EMAIL PROTECTED] $ grep 23470 /tmp/distcc.log distcc[23470] (dcc_scan_args) scanning arguments: g++ -fPIC -g -O -Wall -pipe -pthread -Wno-non-template-friend -Wwrite-strings -ffor-scope -I./shadow/linux -I../../corba_util/linux.bld -I. -I/scratch/users/ngoh/ver/hdr/shadow/linux -I/scratch/users/ngoh/ver/hdr -I/scratch/users/ngoh/ver/hdr/shadow/linux -I/scratch/users/ngoh/ver/hdr -DRW_NO_STL -ftemplate-depth-50 -D_POSIX_THREADS -D_POSIX_THREAD_SAFE_FUNCTIONS -D_REENTRANT -DACE_HAS_AIO_CALLS -DACE_HAS_EXCEPTIONS -I/usr/local/ACE-5.2 -I/usr/local/ACE-5.2/TAO -I/usr/local/ACE-5.2/TAO/tao -I/usr/local/ACE-5.2/TAO/tao/PortableServer -I/usr/local/ACE-5.2/TAO/orbsvcs/orbsvcs -I/scratch/users/ngoh/ver/hdr/portable/tao -I/scratch/users/ngoh/ver/hdr/portable/tao -DCORBA_IMPL_TAO -DRW_CENTURY_REQD -DRW_MULTI_THREAD -D_REENTRANT -I/usr/local/RogueWave-7.1.1 -DMY_RW_CTLIB_=/usr/local/RogueWave-7.1.1/lib/libsdb12d.so -DCOMPAT_LAYER_NO_MIN_MAX -c -o ../../corba_util/linux.bld/BlockingResultListener.o BlockingResultListener.cpp distcc[23470] (dcc_scan_args) found object file "../../corba_util/linux.bld/BlockingResultListener.o" distcc[23470] (dcc_scan_args) found input file "BlockingResultListener.cpp" distcc[23470] compile from BlockingResultListener.cpp to ../../corba_util/linux.bld/BlockingResultListener.o distcc[23470] (dcc_parse_hosts) found tcp token "build04.foo.com" distcc[23470] (dcc_parse_hosts) found tcp token "build05.foo.com" distcc[23470] (dcc_parse_hosts) found tcp token "rizzo.foo.com" distcc[23470] (dcc_parse_hosts) found tcp token "kermit.foo.com" distcc[23470] (dcc_try_lock_host) locked /tmp/distcc_00002493/lock_build04.foo.com_0000000 distcc[23470] (dcc_pick_buildhost) building on build04.foo.com distcc[23470] (dcc_set_output) changed output from "../../corba_util/linux.bld/BlockingResultListener.o" to "/tmp/distcc_00002493/cppout_0000023470.i" distcc[23470] (dcc_set_output) command after: g++ -fPIC -g -O -Wall -pipe -pthread -Wno-non-template-friend -Wwrite-strings -ffor-scope -I./shadow/linux -I../../corba_util/linux.bld -I. -I/scratch/users/ngoh/ver/hdr/shadow/linux -I/scratch/users/ngoh/ver/hdr -I/scratch/users/ngoh/ver/hdr/shadow/linux -I/scratch/users/ngoh/ver/hdr -DRW_NO_STL -ftemplate-depth-50 -D_POSIX_THREADS -D_POSIX_THREAD_SAFE_FUNCTIONS -D_REENTRANT -DACE_HAS_AIO_CALLS -DACE_HAS_EXCEPTIONS -I/usr/local/ACE-5.2 -I/usr/local/ACE-5.2/TAO -I/usr/local/ACE-5.2/TAO/tao -I/usr/local/ACE-5.2/TAO/tao/PortableServer -I/usr/local/ACE-5.2/TAO/orbsvcs/orbsvcs -I/scratch/users/ngoh/ver/hdr/portable/tao -I/scratch/users/ngoh/ver/hdr/portable/tao -DCORBA_IMPL_TAO -DRW_CENTURY_REQD -DRW_MULTI_THREAD -D_REENTRANT -I/usr/local/RogueWave-7.1.1 -DMY_RW_CTLIB_=/usr/local/RogueWave-7.1.1/lib/libsdb12d.so -DCOMPAT_LAYER_NO_MIN_MAX -E -o /tmp/distcc_00002493/cppout_0000023470.i BlockingResultListener.cpp distcc[23470] (dcc_spawn_child) forking to execute g++ -fPIC -g -O -Wall -pipe -pthread -Wno-non-template-friend -Wwrite-strings -ffor-scope -I./shadow/linux -I../../corba_util/linux.bld -I. -I/scratch/users/ngoh/ver/hdr/shadow/linux -I/scratch/users/ngoh/ver/hdr -I/scratch/users/ngoh/ver/hdr/shadow/linux -I/scratch/users/ngoh/ver/hdr -DRW_NO_STL -ftemplate-depth-50 -D_POSIX_THREADS -D_POSIX_THREAD_SAFE_FUNCTIONS -D_REENTRANT -DACE_HAS_AIO_CALLS -DACE_HAS_EXCEPTIONS -I/usr/local/ACE-5.2 -I/usr/local/ACE-5.2/TAO -I/usr/local/ACE-5.2/TAO/tao -I/usr/local/ACE-5.2/TAO/tao/PortableServer -I/usr/local/ACE-5.2/TAO/orbsvcs/orbsvcs -I/scratch/users/ngoh/ver/hdr/portable/tao -I/scratch/users/ngoh/ver/hdr/portable/tao -DCORBA_IMPL_TAO -DRW_CENTURY_REQD -DRW_MULTI_THREAD -D_REENTRANT -I/usr/local/RogueWave-7.1.1 -DMY_RW_CTLIB_=/usr/local/RogueWave-7.1.1/lib/libsdb12d.so -DCOMPAT_LAYER_NO_MIN_MAX -E -o /tmp/distcc_00002493/cppout_0000023470.i BlockingResultListener.cpp distcc[23470] (dcc_spawn_child) child started as pid23516 distcc[23470] exec on build04.foo.com: g++ -fPIC -g -O -Wall -pipe -pthread -Wno-non-template-friend -Wwrite-strings -ffor-scope -I./shadow/linux -I../../corba_util/linux.bld -I. -I/scratch/users/ngoh/ver/hdr/shadow/linux -I/scratch/users/ngoh/ver/hdr -I/scratch/users/ngoh/ver/hdr/shadow/linux -I/scratch/users/ngoh/ver/hdr -DRW_NO_STL -ftemplate-depth-50 -D_POSIX_THREADS -D_POSIX_THREAD_SAFE_FUNCTIONS -D_REENTRANT -DACE_HAS_AIO_CALLS -DACE_HAS_EXCEPTIONS -I/usr/local/ACE-5.2 -I/usr/local/ACE-5.2/TAO -I/usr/local/ACE-5.2/TAO/tao -I/usr/local/ACE-5.2/TAO/tao/PortableServer -I/usr/local/ACE-5.2/TAO/orbsvcs/orbsvcs -I/scratch/users/ngoh/ver/hdr/portable/tao -I/scratch/users/ngoh/ver/hdr/portable/tao -DCORBA_IMPL_TAO -DRW_CENTURY_REQD -DRW_MULTI_THREAD -D_REENTRANT -I/usr/local/RogueWave-7.1.1 -DMY_RW_CTLIB_=/usr/local/RogueWave-7.1.1/lib/libsdb12d.so -DCOMPAT_LAYER_NO_MIN_MAX -c -o ../../corba_util/linux.bld/BlockingResultListener.o BlockingResultListener.cpp distcc[23470] (dcc_open_socket_out) client got connection to build04.foo.com port 4200 on fd6 distcc[23470] (dcc_collect_child) child 23516 terminated with status 0x100 distcc[23470] (dcc_report_rusage) cpp resource usage: 0.080000s user, 0.090000s system distcc[23470] (dcc_critique_status) ERROR: cpp on build03.foo.com failed with exit code 1 distcc[23470] (dcc_build_somewhere) Notice: failed to distribute, running locally instead distcc[23470] exec on localhost: g++ -fPIC -g -O -Wall -pipe -pthread -Wno-non-template-friend -Wwrite-strings -ffor-scope -I./shadow/linux -I../../corba_util/linux.bld -I. -I/scratch/users/ngoh/ver/hdr/shadow/linux -I/scratch/users/ngoh/ver/hdr -I/scratch/users/ngoh/ver/hdr/shadow/linux -I/scratch/users/ngoh/ver/hdr -DRW_NO_STL -ftemplate-depth-50 -D_POSIX_THREADS -D_POSIX_THREAD_SAFE_FUNCTIONS -D_REENTRANT -DACE_HAS_AIO_CALLS -DACE_HAS_EXCEPTIONS -I/usr/local/ACE-5.2 -I/usr/local/ACE-5.2/TAO -I/usr/local/ACE-5.2/TAO/tao -I/usr/local/ACE-5.2/TAO/tao/PortableServer -I/usr/local/ACE-5.2/TAO/orbsvcs/orbsvcs -I/scratch/users/ngoh/ver/hdr/portable/tao -I/scratch/users/ngoh/ver/hdr/portable/tao -DCORBA_IMPL_TAO -DRW_CENTURY_REQD -DRW_MULTI_THREAD -D_REENTRANT -I/usr/local/RogueWave-7.1.1 -DMY_RW_CTLIB_=/usr/local/RogueWave-7.1.1/lib/libsdb12d.so -DCOMPAT_LAYER_NO_MIN_MAX -c -o ../../corba_util/linux.bld/BlockingResultListener.o BlockingResultListener.cpp distcc[23470] (dcc_spawn_child) forking to execute g++ -fPIC -g -O -Wall -pipe -pthread -Wno-non-template-friend -Wwrite-strings -ffor-scope -I./shadow/linux -I../../corba_util/linux.bld -I. -I/scratch/users/ngoh/ver/hdr/shadow/linux -I/scratch/users/ngoh/ver/hdr -I/scratch/users/ngoh/ver/hdr/shadow/linux -I/scratch/users/ngoh/ver/hdr -DRW_NO_STL -ftemplate-depth-50 -D_POSIX_THREADS -D_POSIX_THREAD_SAFE_FUNCTIONS -D_REENTRANT -DACE_HAS_AIO_CALLS -DACE_HAS_EXCEPTIONS -I/usr/local/ACE-5.2 -I/usr/local/ACE-5.2/TAO -I/usr/local/ACE-5.2/TAO/tao -I/usr/local/ACE-5.2/TAO/tao/PortableServer -I/usr/local/ACE-5.2/TAO/orbsvcs/orbsvcs -I/scratch/users/ngoh/ver/hdr/portable/tao -I/scratch/users/ngoh/ver/hdr/portable/tao -DCORBA_IMPL_TAO -DRW_CENTURY_REQD -DRW_MULTI_THREAD -D_REENTRANT -I/usr/local/RogueWave-7.1.1 -DMY_RW_CTLIB_=/usr/local/RogueWave-7.1.1/lib/libsdb12d.so -DCOMPAT_LAYER_NO_MIN_MAX -c -o ../../corba_util/linux.bld/BlockingResultListener.o BlockingResultListener.cpp distcc[23470] (dcc_spawn_child) child started as pid23531 distcc[23470] (dcc_collect_child) child 23531 terminated with status 0x100 distcc[23470] (dcc_report_rusage) g++ resource usage: 0.990000s user, 0.210000s system distcc[23470] (dcc_critique_status) ERROR: compile on build03.foo.com failed with exit code 1 distcc[23470] (dcc_exit) Notice: exit: code 1; self: 0.010000 user 0.010000 sys; children: 1.100000 user 0.420000 sys distcc[23469] (dcc_spawn_child) child started as pid23470 distcc[23469] (dcc_collect_child) child 23470 terminated with status 0
