nadim wrote:
We think this has to do with the coloring codes 'colorgcc' adds but we don't understand what the problem is here. Is distcc using 'gcc -E' for pre-processing? 'colorgcc' happily mixes STDOUT and STDERR. This is a rather obvious 'colorgcc' error if the pre-processed code is output to STDOUT. If you tell us how the preprocessor is used, we can' see how to modify 'colorgcc' so all distcc users can use it too.

Yes, I think it is a colorgcc bug relating to -E. If you set DISTCC_VERBOSE=1 you can see how it's being invoked.



B/Sometimes, distcc/our build system hangs in a few strange ways
(this is a problem I have fixed in our code, still I wonder why I got the messages before)


We get this:

Mon Dec 13 14:18:06 2004
 31080  Receive     hdi_widgettextinputcreate.c                172.31.4.103[0]

Mon Dec 13 14:18:07 2004
 31080  Receive     hdi_widgettextinputcreate.c                172.31.4.103[0]

Mon Dec 13 14:18:08 2004
 31080  Receive     hdi_widgettextinputcreate.c                172.31.4.103[0]
Mon Dec 13 14:18:10 2004

Mon Dec 13 14:18:11 2004


but the build system is still waiting for the command to finish!

Again this was an error in the build system communication with build processes. What I don't understand is why distccmon-text writes 'receive'.

The compiler is recieving compilation results from the server. (Or, possibly, it was killed while in that state.)



This was also surprising:

[EMAIL PROTECTED] obigo]$ ps aux | grep cc
ali 1740 0.0 0.1 1972 516 pts0 SN 12:55 0:00 distcc -O2 -Wall -Wshadow -Wpointer-arith -I/devel/q04c/obigo/msf/msf_lib/intgr -I/devel/q04c/obigo/msf/lib -o /devel/q04c/obigo/projects/ali_grisar_runt/out_ali/msf/lib/hdi_widgetbargetvalues.o -c /devel/q04c/obigo/msf/lib/hdi_widgetbargetvalues.c
ali 1741 0.0 0.0 0 0 pts0 ZN 12:55 0:00 [cc] <defunct>
ali 2845 0.0 0.1 1944 668 pts1 SN 12:57 0:00 grep cc


Is distcc waiting for a zombie process here? The first idea we had was that SIG_CHILD wasn't handled properly. But while we were talking about it, thinking the build was dead in the water, we were surprised to see the build complete!!!!

I don't understand the problem. What's wrong with having a zombie present for a short period of time?


After fixing the communication between PBS and the build processes, I made a test run on my three boxes compiling around 200K lines of code in 500 source files. It build in 45s with my little cluster (3GHz + 1 GHz + 700 MHz) while taking 70s on the 3Ghz only. I ran the test 200 times while looking at the news. I got a surprise though. I monitored the CPU loads and I was surprised to see that one of the boxes didn't compile after some time. distcc did the right thing as it would not use that node on the next build. The problem was that distcc had filled my $TMP directories with 4000 rti file (those produced by gcc). If i removed the files, the compiling just ran smoothly again. This didn't hapend on the other 2 computers. Does any one have an idea why this occurred?

Maybe gcc leaves temp files around if it's interrupted, or maybe you were invoking gcc with -preserve-temps (or whatever it's called).



--
Martin
__ distcc mailing list http://distcc.samba.org/
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/distcc

Reply via email to