Hi all, I have been investigating an issue about job counts for the last few days. It may or may not be a problem in distcc, it's still too early to conclude, but at least it is related to distcc.
Here's the problem in details: I noticed that sometimes, I would compile something using distcc, and observe (in distccmon-gnome, or -text, for that matter) less jobs that I asked for. I know that it sometimes happens that there are not enough jobs to distribute because of dependency issues, but that wasn't the case. This is a project I compile very often, and which would normally use 4 jobs when I asked for 4. Except that, from times to times, I would get only 3 on one given compilation run. Or even only 2, although less frequently. Or even 1, although even less frequently. Facts: * I can only observe the phenomenon when compiling a Linux 2.6 kernel tree. * I can only observe the problem when *running* a Linux 2.6 kernel. * The problem happens randomly. I can do "make CC=distcc -j4" and see 4 jobs, interrupt the compilation, restart it, and have only 3. Or the other way around. * Once a job is "missing" it will not come back for a given compilation run. Likewise, when the compilation starts with all the requested jobs, jobs won't disappear. Looks like a make-init issue (see below). * I observe the problem on two different machines (the two machines of my farm), both running Slackware 9.1, and hand-compiled distcc 2.16. * I could reproduce the problem with DISTCC_HOSTS="localhost" and -j2. After several tries, one given compilation run would show a single job. * Of course, any test case in which I couldn't see the problem doesn't mean it couldn't have happen. It may be simply less frequent so I wouldn't catch it with a limited number of tries. Guesses: I would suspect (GNU) make more than distcc since the problem is either there or not there for a whole compilation run. My distribution comes with make 3.80. I tried compiling it myself, didn't change a thing. I tried compiling 3.79.1 myself, didn't help either. I cannot try older versions since Linux 2.6 is said to require 3.79.1. However, the use of distcc somewhat seems to trigger the problem. I think I observed it once with a gcc-only compilation, but am unable to reproduce it now, so I'm not sure. Of course it's easier to spot the problem with distcc because it's meant to monitor the compilation jobs. The only point about which my distribution is not "Linux 2.6 compilant" is procps. Since neither make nor distcc is linked with libproc, I suppose that it isn't the problem, but I may try to upgrade if someone things it could be. Questions: 1* Was this problem ever heard of? 2* Could someone try to reproduce it? Basically, you have to run Linux 2.6, compile a 2.6 kernel tree using distcc while running distccmon-gnome, and interrupt the compilation and restart it over and over again. In my case, there will regularly be runs with 3 jobs instead of the expected 4. Failure frequency is variable. Sometimes I need a dozen runs before I see the problem. Sometimes I need several runs to *not* fall into in. 3* Any idea what the problem could be? How would I investigate? I tried distcc's verbose mode, but can't see anything relevant in the logs. Maybe I just don't know what to look for? Thanks, -- Jean "Khali" Delvare http://khali.linux-fr.org/ __ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/distcc
