All, Over the last couple of weeks I've been testing distcc (with pcons (parallel cons)) to see how it performs when multiple builds are done at once using a single, shared, heterogeneous compilation farm, with varying levels of parallelism (-j values) and varying numbers of simultaneous builds, and varying numbers of hosts in the compilation farm. This posting contains my results. But, first, some info about my setup.
o My tree contains 2480 .c/.cc files, in 148 directories. There are 97 directories that have 20 or fewer files in them each. The rest have more than 20 files in them, with 2 directories have over 200 each, and one directory having 548 files in it. o My compilation farm has 40 CPUs in it, on 18 machines. The machines all run Solaris 2.x, where x is 6, 7, or 8. One machine has 8 cpus, one has 4, and the rest have 2 or 1. They vary from pretty fast to pretty slow. o For one of the tests, I used only my fastest machines: 23 CPUs in 8 machines. o I use gcc 2.95.2 -- a very old version of the compiler. Don't ask why we don't upgrade. It is a long boring story. :-( o I used distcc 2.16, without changes (not actually true, since we always call "gcc -x c++", so I changed distcc to allow it). o pcons spawns parallel builds per-library. In other words, it parallelizes the compilations of all files that will be collected into a library, before starting on the next library. o When I do a single old build with -j1 and NO distcc, the time to compile is 4:02:11 (4 hrs, 2 minutes, 11 seconds). OK: there is the setup. Now, some data. I've attached an Excel spreadsheet which contains some numbers and very informative graphs. If someone wants the data in a different format (that I can generate), please let me know. I'd be happy to do it, for the good of the cause. Explanations of the data/graphs: o "6 simultaneous" means that I ran 6 independent compiles, each from a different machine, each with their own DISTCC_DIR set, all using the same compilation farm. o "Parallelism" is the -j value for the build. o "1 at a time" means I ran only 1 compilation simultaneously, using the compilation farm. o "6 with only fastest hosts" means I ran the builds with only the fastest host set. Dan Kegel has been suggesting that I should just be rid of the slower build hosts. I wanted to test his opinion. :-) My conclusions from the top set of data: o all the builds are MUCH faster using parallelization than without: recall that with -j1, the time was about 4 hours. Thank you, Martin Pool, for distcc. o -j20 produced the fastest builds, of the -j values that I tried. This result may counter some opinions that say that "massively parallel" builds (with -j > 10) do not produce any benefit. For me, it did. See the FAQ: http://distcc.samba.org/faq.html#dash-j for more info. o -j24 was slower than -j20 because more time was spent blocked waiting for CPUs to become free. This implies that it is good to have more hosts in the compilation farm. o Comparing the "6 with only fastest hosts" to "6 simultaneously" data, we see that if you are guaranteed to be the only person using the compilation farm, then indeed it is best to have only the fastest hosts in the farm. However, as soon as more than 1 compilation is run at once, it is better to have the slower hosts in the compilation farm. This is because with only the fastest hosts in the farm, there is significant contention for CPUs with multiple simultaneous builds. Looking at the second set of data: o Having concluded that for me, the -j20 value produced the best compilation times, I wanted to see if I could make multiple simultaneous builds cooperate in sharing the compilation farm cpus, so that contention for CPUs would be minimized. The only way I could figure out how to do this was to use <<<>>>>'s suggestion, and build a single "host-server" that would keep track of the hosts in the compilation farm, and hand them out to distcc runs. So, I built a system similar to what <<<>>> described in his email, and for which he gave tcl code. But, I've worked with Tcl for many years and am not fond of it. So, I rewrote it in my new favorite language, python. My host server reads the "hosts" file and a "hosts-info" file to get the list of hosts, how many CPUs each has, and how fast each host is (what I call a "power index"). It also listens on 4 well-known TCP ports: 1) for reports from the hosts indicating their current load average; 2) for indications that a host is up or down (available or unavailable); 3) for requests by a host-server monitoring program. To these requests it sends its current host database. 4) for requests for hosts from compilation executions. It responds to these requests with a hostname of an available host (a host with an available CPU). o Each compilation is "gethost.py distcc gcc -x c++ args args args". gethost.py connects to the host server and gets a hostname in response. It sets the hostname in DISTCC_HOSTS and calls "distcc gcc -x c++ args args args...". When the host server has no CPUs available, gethost.py will get "" back, and thus, distcc will run the compilation on localhost. o The host server always hands out the fastest available compilation hosts (actually CPUs) first. The host server uses the load average messages from the compilation hosts to adjust the power of the CPU in its database. When a host is running at full capacity (load average / number of cpus > 1.0), the host is avoided for a while, until the load average shows the machine is able to handle some more compilations again. o I've found that my algorithm keeps the fastest hosts very busy -- with adjusted load averages hanging just below capacity for the most part. I've very happy with that result. The algorithm also takes into account machines that are being used for other purposes -- as our compilation machines are generally available for people to use running Xvnc, xterms, emacs, etc. OK: on to the results. I wanted to see how scalable my solution was. I'm thrilled with my early results: o you can see from the graph that the basic build, with one simultaneous compilation at -j20, takes about the same time as in the previous graph (~45 minutes vs. 46 minutes. So this confirms (in my mind, at least) that the data is pretty solid. o As I scale up from 1 simultaneous build to 6 simultaneous builds, the times go consistently up, as I expected. Again, this confirms that my testing setup is pretty reliable. And, good news!, the times go up quite slowly. Yeah! o The scalability of the system seems really good to me. ~45 minutes when there is one person using the compilation farm vs. 57 minutes when there are 6 simultaneous builds. That is better than I expected. o The "number of times no machine is available" data seem to indicate that adding more machines to the compilation farm, especially when the farm is quite heavily loaded, would benefit build times. This would reduce the number of times that each CPU in the system is in use, and would reduce the number of localhost builds. o (Note: I did one build on the localhost, with -j10, and it took 1:18:43, so this gives you some feel for how fast the localhost is. (It has 4 processors.)) o I would like to try more scalability tests yet, with more than 6 simultaneous builds happening. This might show us some more interesting data. I'd love to hear was others think of this data, and my conclusions. And, I will be posting my Python code as soon as I iron out some ugliness, etc. If you want it with its ugliness intact, please let me know. Vic __________________________________ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo
parallel-build-times.xls
Description: parallel-build-times.xls
__ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/distcc
