On Wed, Aug 21, 2013 at 04:17:48PM +0200, Jan Hubicka wrote:
> Hi,
> this is my attempt to bring GCC into wonderful era of multicore CPUs :)
> It is a hack, but it seems to help quite a lot.  About 50% of WPA time is 
> spent
> by streaming the individual ltrans .o files.  This can be easily parallelized
> by fork - we do nothing afterwards, just exit and pass the list to the linker.

One risk is if someone streams to a spinning disk it may add more seeks for 
the parallel IO. But I think it's a reasonable tradeoffs.

We should also use a faster compressor

> For -flto=jobserver I simply fork all 32 processes.  It may not be a disaster,
> but perhaps we should figure out how to communicate with jobserver.  At first
> glance on document on how it works, it seems easy to add. Perhaps we can even
> convicne GNU Make folks to put simple helpers to libiberty?

lto=jobserver is still broken and confuses tokens on large builds (ends
with a 0 read) I did some debugging recently, and I suspect a Linux kernel
bug now. Still haven't tracked it down.

Any workarounds would need make changs unfortunately.

> 
> We also may figure out number of CPUs (is it available i.e. from libgomp)

sysconf(_SC_NPROCESSORS_ONLN) ? 

> and use it by default even if user do not care to pass number of processes.
> Naturally these streaming forks should be cheap memory wise. I hope Martin
> will get me some actual numbers.
> 
> With the patch the WPA time of firefox goes down to 2 minutes (4.8 needs about
> 30 minutes and without the hack one needs about 5 minutes)

Cool!

I'll try it on my builds
>  
> +fparallelism=
> +LTO Joined
> +Run the link-time optimizer in whole program analysis (WPA) mode.

The description does not make sense

Rest of patch looks good from a quick read, although I would prefer to 
do the waiting for children in the "parent", not the "last one"

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only

Reply via email to