This is the preprocessing patch I submitted in a big patch earlier, but now it is separated out.  It assumes you have already applied the randomization patch I submitted a couple of days ago.

 

The main notion here is that preprocessing is not free.  The current code assumes that preprocessing is free, and this is why we see worse performance when using very high parallel makes.   I believe this is a sign that something is wrong.  We should be seeing at worst, equal behavior, but we are actually seeing a performance decrease when going too parallel.  The current code causes highly variable load on the master.  I was reading the distcc archive last night, and saw at least one thread where a user was complaining of the system load spiking up to 11 on the master, while not going above 1 on the slave compiler boxes.   Also, tuning the parallel factor on make now is something that is tightly bound to the code you are compiling.  We have one library that does no better after –j15.   We have another library that still shows compile time improvement all the way up to –j40.     When both libraries are built because of one make command, it’s hard to get the right parallelization. 

 

Having locking for preprocessing fixes all of the above problems. 

   

After applying this patch, for our 4 processor master compile machine, using a cpp limit of 8 results in the load staying even at around 4.   This is considerably better than the load numbers you see when using the “-l” option to make.  Make gets very bursty when using the load limiting code, at least for our old GNU make 3.78.1.

Also, you can throw a moderately high –j, and it does some degree of self tuning.  For a –j40, distcc no longer tries to hold 40 slave locks.  Instead, it makes sure it has a preprocessing lock before trying to get the slave lock.   This means you get the benefit of extra slave boxes when the compile jobs take a long time, and for shorter compile jobs, you don’t hold 40 slave locks while waiting for 40 cpp0 processes to finish.  Instead you hold 8 slave locks, while waiting for 8 cpp0 processes to finish.  They’re much faster when not competing with 32 other preprocessors.  J

 

The other change here is that the number of fallback slots is now configurable as well.   We have found that 3 is a good number for us here, even though we are running on a 4 processor box, because running 4 simultaneous links occasionally consumes all available memory, and then the box becomes unusable until the ‘out of memory’ killer cleans something up for us.    Other users probably would want to be able to tune that as well, without having to recompile.

 

 

           

 

 

Attachment: patch_2005_04_01.cpp_locking
Description: patch_2005_04_01.cpp_locking

__ 
distcc mailing list            http://distcc.samba.org/
To unsubscribe or change options: 
https://lists.samba.org/mailman/listinfo/distcc

Reply via email to