|
This is the preprocessing patch I submitted in a big patch
earlier, but now it is separated out. It assumes you have already applied
the randomization patch I submitted a couple of days ago. The main notion here is that preprocessing is not
free. The current code assumes that preprocessing is free, and this is
why we see worse performance when using very high parallel makes. I
believe this is a sign that something is wrong. We should be seeing at
worst, equal behavior, but we are actually seeing a performance decrease when
going too parallel. The current code causes highly variable load on the
master. I was reading the distcc archive last night, and saw at least one
thread where a user was complaining of the system load spiking up to 11 on the
master, while not going above 1 on the slave compiler boxes. Also,
tuning the parallel factor on make now is something that is tightly bound to
the code you are compiling. We have one library that does no better after
–j15. We have another library that still shows compile time
improvement all the way up to –j40. When both
libraries are built because of one make command, it’s hard to get the
right parallelization. Having locking for preprocessing fixes all of the above
problems. After applying this patch, for our 4 processor master
compile machine, using a cpp limit of 8 results in the load staying even at around
4. This is considerably better than the load numbers you see when
using the “-l” option to make. Make gets very bursty when
using the load limiting code, at least for our old GNU make 3.78.1. Also, you can throw a moderately high –j, and it does
some degree of self tuning. For a –j40, distcc no longer tries to
hold 40 slave locks. Instead, it makes sure it has a preprocessing lock
before trying to get the slave lock. This means you get the benefit
of extra slave boxes when the compile jobs take a long time, and for shorter
compile jobs, you don’t hold 40 slave locks while waiting for 40 cpp0
processes to finish. Instead you hold 8 slave locks, while waiting for 8
cpp0 processes to finish. They’re much faster when not competing
with 32 other preprocessors. J The other change here is that the number of fallback slots
is now configurable as well. We have found that 3 is a good number
for us here, even though we are running on a 4 processor box, because running 4
simultaneous links occasionally consumes all available memory, and then the box
becomes unusable until the ‘out of memory’ killer cleans something
up for us. Other users probably would want to be able to tune
that as well, without having to recompile. |
patch_2005_04_01.cpp_locking
Description: patch_2005_04_01.cpp_locking
__ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/distcc
