The slow machines are at the end of the list. When there's a large build going, all the machines get stuff scheduled. While the fast ones will do most of the files (say 80%), the slower ones will do 20%.
So it's likely that the last batch of files to compile, all the fast machines will be done, build system waiting to start linking, and one or two objects remaining being compiled on the slow machines. I realize now that the interesting speedup situation is when there are few files to rebuild, as it's usually when you are actively working on the source and want to wait as little as possible. When you start a large full rebuild, you don't really care about saving an extra 15 secs, you just go get more java :) I could go around the problem by using the local cluster when it's a small rebuild (therefore avoiding a wait on slow remote machine), and throwing the remote cluster in when I do a full build only. That's more work for the end user though. I agree with Martin that 'in general' distcc should gather information about the speed of the various machines and use that when doing work distribution heuristics. When you send the same compile to several machines, you will likely want to configure things so that you don't eat resources for other users. But I'm not sure it's a big issue, it's mostly a social thing between developers ... afaik it'd be fairly easy to "DoS" a distcc cluster with the current distcc implementation. TTimo On Mon, 25 Aug 2003 22:14:54 +1000 Martin Pool <[EMAIL PROTECTED]> wrote: > On Mon 2003-08-25, Dag Wieers wrote: > > > The way I thought Timothee meant it, was like this: > > > > Whenever a host has finished processing jobs and distcc (make) > > is out of jobs but still waiting for results on some jobs. It > > could resend the (already preprocessed) jobs to any idle > > machines and use the result from the fastest machines that can > > deliver it. (and finish the other ones) > > > > Of course this means that the distcc instances have to have some > > shared knowledge about what jobs are still ongoing and access to > > preprocessed output. > > > > But I like the idea, especially in environments where some of the > > servers in your cluster are sometimes used for heavy duty. If you're > > waiting, you might as well bet on another horse (especially when it is > > at no extra cost) > > But there is a cost: other jobs which might arrive in the future or > which might be sent by another user will not be able to use those > servers. So we want to do this only when the possible gain is so great > as to justify the risk. > > One can imagine a naive algorithm wasting a lot of time. > > To turn the question around: why don't we just schedule the job on the > nearby machine in the first place? > > -- > Martin > __ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: http://lists.samba.org/cgi-bin/mailman/listinfo/distcc
