On Wed, Aug 21, 2013 at 04:17:48PM +0200, Jan Hubicka wrote: > Hi, > this is my attempt to bring GCC into wonderful era of multicore CPUs :) > It is a hack, but it seems to help quite a lot. About 50% of WPA time is > spent > by streaming the individual ltrans .o files. This can be easily parallelized > by fork - we do nothing afterwards, just exit and pass the list to the linker.
One risk is if someone streams to a spinning disk it may add more seeks for the parallel IO. But I think it's a reasonable tradeoffs. We should also use a faster compressor > For -flto=jobserver I simply fork all 32 processes. It may not be a disaster, > but perhaps we should figure out how to communicate with jobserver. At first > glance on document on how it works, it seems easy to add. Perhaps we can even > convicne GNU Make folks to put simple helpers to libiberty? lto=jobserver is still broken and confuses tokens on large builds (ends with a 0 read) I did some debugging recently, and I suspect a Linux kernel bug now. Still haven't tracked it down. Any workarounds would need make changs unfortunately. > > We also may figure out number of CPUs (is it available i.e. from libgomp) sysconf(_SC_NPROCESSORS_ONLN) ? > and use it by default even if user do not care to pass number of processes. > Naturally these streaming forks should be cheap memory wise. I hope Martin > will get me some actual numbers. > > With the patch the WPA time of firefox goes down to 2 minutes (4.8 needs about > 30 minutes and without the hack one needs about 5 minutes) Cool! I'll try it on my builds > > +fparallelism= > +LTO Joined > +Run the link-time optimizer in whole program analysis (WPA) mode. The description does not make sense Rest of patch looks good from a quick read, although I would prefer to do the waiting for children in the "parent", not the "last one" -Andi -- a...@linux.intel.com -- Speaking for myself only