[cc'd to list] On 24 Oct 2002, Stefan Becker <[EMAIL PROTECTED]> wrote: > OK, how does distccd then know > > - Machine X has n processors vs. Y with m processors > > - Machine X has faster processors than Y > > -> algorithm should run more processes on X than on Y.
The current algorithm, in case you didn't see it, is http://distcc.samba.org/manual/html/distcc-2.html#ss2.7 In fact, the general case might be that you have a Machine A has one 2GHz processor Machine B has four 0.5GHz processors In this situation it's probably best to try to keep one job running on A, and then put the next four on B. Except perhaps it's not: B might already be very heavily loaded, while A might be idle. Or A might be low on memory. And this is only for the steady state. In practice, there will often be period where in fact we have a limited number of jobs to run: for example, because we're running a ./configure script that builds many small jobs in series, or because a recursive makefile builds just a few files in each directory. In this case, what would happen in a directory with just five files, all launched at once? It would seem that the best approach is to run one on A, and four on B. However, that might cause us to wait around with A idle for a long time. In addition consider the complication of hyperthreaded CPUs. Also, since sending source probably floods the network, it may be better to completely send one source file and let the server start that, rather than having four files jammed up trying to get through at the same time. In general, if there is only one task able to run, it is better to do it locally. However, that's not true when the local machine is much slower than some others. Possibly it's not true if a lot of jobs are going to start shortly and we will need the local machine for running cpp. I expect the scheduler can be substantially improved but it is a nontrivial problem. It's harder than it might be because we don't know what Make is going to do in the future: we can only decide on what has happened in the past. > Please don't tell me the algorithm uses "load average", because from my > experiences with parallel build systems this value is absolutely > useless, ie. machine X with a load average 5 happily churns away at any > job, while machine Y grinds to a halt and slows down the whole > build. Yes, I've already thought that load average would not be very helpful: load average changes too slowly (on a scale of minutes), whereas compiler processes commence and complete very quickly (in a few seconds). So I think load average will react too slowly to be useful in deciding where to run compiler jobs. I suspect (without proof) that a central controller will have similar problems. In 0.13cvs I've tried having the client-side process limits per machine, but that doesn't seem to work very well. At the moment I think the best approach is this: The server limits the number of compiler tasks it will run at any time. This can be done explicitly as a distccd option. Or, on platforms where we can find it out, we can default to the number of CPUs. I'm not quite sure how the server will tell the client to slow down. Perhaps it will just not accept any more connections, or perhaps it will slow down while accepting a job. The client will spread load across all machines, and use client-side locks to try to make sure only one file is in flight to any server at any point in time. As a further refinement, the client can keep a rolling average estimated speed for each server. If all are idle, it will prefer the faster one. This might be computed as sqrt(source_bytes * object_bytes) / total_time with a rolling average something like avgspeed = .95 * avgspeed + .05 * curr_speed That will be useful when running things like ./configure where there is no parallelism. -- Martin _______________________________________________ distcc mailing list [EMAIL PROTECTED] http://lists.samba.org/cgi-bin/mailman/listinfo/distcc
