[gentoo-user] Re: distributed emerge

Kai Krakow Tue, 26 Sep 2017 17:27:44 -0700

Am Wed, 27 Sep 2017 02:04:12 +0200
schrieb Kai Krakow <hurikha...@gmail.com>:


> Am Mon, 25 Sep 2017 21:35:02 +1000
> schrieb Damo Brisbane <dhatche...@gmail.com>:
> 
> > Can someone point where I might go for parallel @world build, it is
> > really for my own curiositynat this time. Currently I stage binaries
> > for multiple machines on a single nfs share, but the assumption is
> > to use instead some distributed filesystem. So I think I just need a
> > recipie, pointers or ideas on how to distribute emerge on an @world
> > set? I am thinking granular first, ie per package rather than eg
> > distributed gcc within a single package.  
> 
> As others already pointed out, distcc introduces more headache then it
> solves.
> 
> If you are searching for a solution due to performance of package
> building, you get most profit from building on tmpfs.
> 
> Then, I also suggest going breadth first, thus building more packages
> at the same time.
> 
> Your question implies depth first which means having more compiler
> processes running at a time for a single package. But most build
> processes do not scale out very well for the following reasons:
> 
>   1. Configure phases are serial processes
> 
>   2. Dependencies in Makefile are often buggy or incomplete
> 
>   3. Dependencies between source files often allow parallel
>      building only for short burst throughout the complete
>      build and are serial otherwise
> 
> Building packages in parallel instead solves all these problems: Each
> build phase can one in parallel to every other build phase. So while a
> serialized configure phase is running or package is bundled/merged,
> another package can have multiple gccs running while a third package
> maybe builds serialized due to source file deps.
> 
> Also, emerge is very IO bound. Resorting to distcc won't solve this,
> as a lot of compiler internals need to be copied back and forth
> between the peers. It may even create more IO than building locally
> only. Using tmpfs instead solves this much better.
> 
> I'm using the following settings and have 100% on all eight cores
> almost all the time during emerge, while IO is idle most of the time:
> 
> MAKEOPTS="-s -j9 -l8"
> FEATURES="sfperms parallel-fetch parallel-install protect-owned \
> userfetch splitdebug fail-clean cgroup compressdebug buildpkg \
> binpkg-multi-instance clean-logs userpriv usersandbox"
> EMERGE_DEFAULT_OPTS="--binpkg-respect-use=y --binpkg-changed-deps=y \
> --jobs=10 --load-average 8 --keep-going --usepkg"
> 
> $ fgrep portage /etc/fstab
> none /var/tmp/portage tmpfs
> noauto,x-systemd.automount,x-systemd.idle-timeout=60,size=32G,mode=770,uid=portage,gid=portage
> 
> Have either enough swap or lower the tmpfs allocation.
> 
> Using FEATURES buildpkg pinpkg-multi-instance allows to reuse packages
> on different but similar machines. EMERGE_DEFAULT_OPTS makes use of
> this. /usr/portage/{distfiles,packages} is on shared media.
> 
> Also, I'm usually building world upgrades with --changed-deps to
> rebuild dependers and update the bin packages that way.
> 
> I'm not sure, tho, if running emerge in parallel on two machines would
> pickup newly appearing binpkgs during the process... I guess, not. I
> usually don't do that except the dep tree looks independent between
> both machines.
> 
> If your machine cannot saturate the CPU throughout the whole emerge
> process (as long as there are parallel ebuild running), then distcc
> will clearly not help you, make the complete process slower due to
> waiting on remote resources, and even increase the load. Only very
> few, huge projects, with Makefile deps very clearly optimized or
> specially crafted for distributed builds can benefit from distcc.
> Most projects aren't of this type, even Chromium and LibreOffice
> don't. Exactly, those projects have way to much meta data to
> transport between the distcc peers.
> 
> But YMMV. I'd say, try a different path first.

I imagine one case where distcc could help you: If the building machine
(that one running emerge) is very constraint on system resources. But
in that case, the much better performing option is still staging the
builds on another machine and using binary install on that low-resource
machine.


-- 
Regards,
Kai

Replies to list-only preferred.

[gentoo-user] Re: distributed emerge

Reply via email to