Oops: substitute "Scott Lystig Fritchie" for "<<<>>>" in the message. Sorry, Scott.
Vic --- Victor Norman <[EMAIL PROTECTED]> wrote: > All, > > Over the last couple of weeks I've been testing distcc (with pcons (parallel > cons)) to see how it performs when multiple builds are done at once using a > single, shared, heterogeneous compilation farm, with varying levels of > parallelism (-j values) and varying numbers of simultaneous builds, and > varying > numbers of hosts in the compilation farm. This posting contains my results. > But, first, some info about my setup. > > o My tree contains 2480 .c/.cc files, in 148 directories. There are 97 > directories that have 20 or fewer files in them each. The rest have more > than > 20 files in them, with 2 directories have over 200 each, and one directory > having 548 files in it. > > o My compilation farm has 40 CPUs in it, on 18 machines. The machines all > run > Solaris 2.x, where x is 6, 7, or 8. One machine has 8 cpus, one has 4, and > the > rest have 2 or 1. They vary from pretty fast to pretty slow. > > o For one of the tests, I used only my fastest machines: 23 CPUs in 8 > machines. > > o I use gcc 2.95.2 -- a very old version of the compiler. Don't ask why we > don't upgrade. It is a long boring story. :-( > > o I used distcc 2.16, without changes (not actually true, since we always > call > "gcc -x c++", so I changed distcc to allow it). > > o pcons spawns parallel builds per-library. In other words, it parallelizes > the compilations of all files that will be collected into a library, before > starting on the next library. > > o When I do a single old build with -j1 and NO distcc, the time to compile is > 4:02:11 (4 hrs, 2 minutes, 11 seconds). > > OK: there is the setup. Now, some data. I've attached an Excel spreadsheet > which contains some numbers and very informative graphs. If someone wants > the > data in a different format (that I can generate), please let me know. I'd be > happy to do it, for the good of the cause. > > Explanations of the data/graphs: > > o "6 simultaneous" means that I ran 6 independent compiles, each from a > different machine, each with their own DISTCC_DIR set, all using the same > compilation farm. > > o "Parallelism" is the -j value for the build. > > o "1 at a time" means I ran only 1 compilation simultaneously, using the > compilation farm. > > o "6 with only fastest hosts" means I ran the builds with only the fastest > host > set. Dan Kegel has been suggesting that I should just be rid of the slower > build hosts. I wanted to test his opinion. :-) > > My conclusions from the top set of data: > > o all the builds are MUCH faster using parallelization than without: recall > that with -j1, the time was about 4 hours. Thank you, Martin Pool, for > distcc. > > o -j20 produced the fastest builds, of the -j values that I tried. This > result > may counter some opinions that say that "massively parallel" builds (with -j > > > 10) do not produce any benefit. For me, it did. See the FAQ: > http://distcc.samba.org/faq.html#dash-j for more info. > > o -j24 was slower than -j20 because more time was spent blocked waiting for > CPUs to become free. This implies that it is good to have more hosts in the > compilation farm. > > o Comparing the "6 with only fastest hosts" to "6 simultaneously" data, we > see > that if you are guaranteed to be the only person using the compilation farm, > then indeed it is best to have only the fastest hosts in the farm. However, > as > soon as more than 1 compilation is run at once, it is better to have the > slower > hosts in the compilation farm. This is because with only the fastest hosts > in > the farm, there is significant contention for CPUs with multiple simultaneous > builds. > > > Looking at the second set of data: > > o Having concluded that for me, the -j20 value produced the best compilation > times, I wanted to see if I could make multiple simultaneous builds cooperate > in sharing the compilation farm cpus, so that contention for CPUs would be > minimized. The only way I could figure out how to do this was to use > <<<>>>>'s > suggestion, and build a single "host-server" that would keep track of the > hosts > in the compilation farm, and hand them out to distcc runs. So, I built a > system similar to what <<<>>> described in his email, and for which he gave > tcl code. But, I've worked with Tcl for many years and am not fond of it. > So, > I rewrote it in my new favorite language, python. My host server reads the > "hosts" file and a "hosts-info" file to get the list of hosts, how many CPUs > each has, and how fast each host is (what I call a "power index"). It also > listens on 4 well-known TCP ports: 1) for reports from the hosts indicating > their current load average; 2) for indications that a host is up or down > (available or unavailable); 3) for requests by a host-server monitoring > program. To these requests it sends its current host database. 4) for > requests for hosts from compilation executions. It responds to these > requests > with a hostname of an available host (a host with an available CPU). > > o Each compilation is "gethost.py distcc gcc -x c++ args args args". > gethost.py connects to the host server and gets a hostname in response. It > sets the hostname in DISTCC_HOSTS and calls "distcc gcc -x c++ args args > args...". When the host server has no CPUs available, gethost.py will get "" > back, and thus, distcc will run the compilation on localhost. > > o The host server always hands out the fastest available compilation hosts > (actually CPUs) first. The host server uses the load average messages from > the > compilation hosts to adjust the power of the CPU in its database. When a > host > is running at full capacity (load average / number of cpus > 1.0), the host > is > avoided for a while, until the load average shows the machine is able to > handle > some more compilations again. > > o I've found that my algorithm keeps the fastest hosts very busy -- with > adjusted load averages hanging just below capacity for the most part. I've > very happy with that result. The algorithm also takes into account machines > that are being used for other purposes -- as our compilation machines are > generally available for people to use running Xvnc, xterms, emacs, etc. > > OK: on to the results. I wanted to see how scalable my solution was. I'm > thrilled with my early results: > > o you can see from the graph that the basic build, with one simultaneous > compilation at -j20, takes about the same time as in the previous graph (~45 > minutes vs. 46 minutes. So this confirms (in my mind, at least) that the > data > is pretty solid. > > o As I scale up from 1 simultaneous build to 6 simultaneous builds, the times > go consistently up, as I expected. Again, this confirms that my testing > setup > is pretty reliable. And, good news!, the times go up quite slowly. Yeah! > > o The scalability of the system seems really good to me. ~45 minutes when > there is one person using the compilation farm vs. 57 minutes when there are > 6 > simultaneous builds. That is better than I expected. > > o The "number of times no machine is available" data seem to indicate that > adding more machines to the compilation farm, especially when the farm is > quite > heavily loaded, would benefit build times. This would reduce the number of > times that each CPU in the system is in use, and would reduce the number of > localhost builds. > > o (Note: I did one build on the localhost, with -j10, and it took 1:18:43, so > this gives you some feel for how fast the localhost is. (It has 4 > processors.)) > > o I would like to try more scalability tests yet, with more than 6 > simultaneous > builds happening. This might show us some more interesting data. > > > I'd love to hear was others think of this data, and my conclusions. And, I > will be posting my Python code as soon as I iron out some ugliness, etc. If > you want it with its ugliness intact, please let me know. > > Vic > > > > __________________________________ > Do you Yahoo!? > Take Yahoo! Mail with you! Get it on your mobile phone. > http://mobile.yahoo.com/maildemo > ATTACHMENT part 2 application/vnd.ms-excel name=parallel-build-times.xls > __ > distcc mailing list http://distcc.samba.org/ > To unsubscribe or change options: > http://lists.samba.org/mailman/listinfo/distcc _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com __ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/distcc