Re: [gentoo-dev] avoiding urgent stabilizations

Ed W Sat, 26 Feb 2011 03:46:38 -0800

Hi

But, for me, even a trimmed-down Gentoo is still too large
(has to contain the whole base packages, from portage to
toolchain, includes, etc). I'd prefer having only the essential
runtime stuff within the containers.

I'm just building some embedded devices on the side using gentoo and myminimal builds are only a few MB? Curious why you feel you need to movefrom Gentoo to get the size smaller?

Seems like your complaint is that you have gentoo installs which arefull featured with a toolchain and portage, which you are comparing toan installation you built with a different tool that doesn't have atoolchain installed? However, you can do the same using gentoo if youwish? (you just need a lightweight package installer to avoid installingportage)


I think your main options are:

1) Build your base images without a toolchain or portage and use aminimal package installer to install pre-built binary packages. Thisseems fraught with issues long term though...

2) Build your base images without a toolchain, but with portage (andperhaps a very minimal python). This gives you full dependency trackingand obviously bind mount/nfs mount the actual portage tree to avoidspace used there. This seems workable and minimal?

3) If we are talking virtual machines then who cares if your containersare individually quite large, if the files in them are duplicated acrossall containers? Simply use an appropriate de-duplication strategy tocoalesce the space and most of the disadvantages disappear? eglinux-vserver you can simply hardlink all the common files across yourinstallations and allow the COW patch to break hardlinks if anyonealters a file in a single instance. Or you could use aufs to mount awriteable layer over your common base VM instance? Or you could use oneof the filesystems which de-duplicates files in the background (somecaveats apply here to avoid memory still being used multiple times ineach VM). Or under KVM there is the memory coalescing feature whichmerges similar code pages (forget it's name?)

Personally I think option 3) is quite interesting from the medium numberof virtual machines, ie in the 10s to hundreds, ie simply don't worryabout it, let the OS do the work. In the hundreds to thousands pluslevel I guess you have unique challenges and I would be wrong to try andsuggest a solution from the comfort of a laptop without having thatresponsibility, but I would have thought there was some advantage in avery rigidly deployed base OS generated and updated very precisely?

For this we need a different approach (strictly separating build
and production environments). Binary distros (eg. Debian) might
be one option, but they're lacking the configurability and mostly
are still too large. So I'm going a different route using my own
buildsystem - called Briegel - which originally was designed for
embedded/small-device targets.

For now I didn't have the spare time to port all the packages
required for complete server systems (most of it is making
them all cleanly crosscompile'able, as this is a fundamental
concept of Briegel). But maybe you'd like to join in and try it :)

Sounds like an interesting challenge, but I'm unconvinced you can'tsolve 90% of your problem within the constraints of Gentoo? This savesyou a bunch of time that could be invested in the last 10% through moretraditional means?

It does appear like managing large numbers of virtual machines is one
are that gentoo could score very well?  Interested to see any chatter on
how others solve this problem, or any general advocacy?  Probably we
should start a new thread though...

I'm not sure if Gentoo really is the right distro for that purpose,
as it's targeted to very different systems (i.g. Gentoo boxes are
expected to be quite unique, beginning with different per-package
useflags, even down to cflags, etc). But it might still be a good
basis for building specific system images (let's call them stage5 ;-))

I won't disagree on your "where it's targeted", but just to re-iteratewhy I think Gentoo works well is that it does have a very workablebinary package feature!

My way of working is to use (several) shared binary package repos andthe guests largely pull from those shared package directories. In factwhat I do is have a minimal number of shared "/usr/portage/package"directories and I mount an appropriate one to the guest type at boottime. At the moment my main two options are "32bit" and "64bit" for thepackage mounts, but I recently introduced a new machine type which isheld back to perl 5.8 and that guest gets it's own package mount sinceit's obviously linking a lot of binaries differently

So, my process is to test an update on a small number of guests, eitherdedicated test guests or less important live guests. If this looks goodthen I run the upgrade against all other Vms of the same type and theywill update quickly from package binaries

Now, the icing is that this works extremely well even once you decide tolightly customise machine types. So for example my binary packages arevery high level (eg 32/64bit), my "profiles" would be fairly high level,eg I have www-apache and www-nginx base profiles. However, a specificvirtual machine running say nginx might itself need a specific PHPapplication installed, and that itself might need some dependencies,which in turn might require a specific set of customisation of use flagsand versions.

Now, the neat thing is that the binary upgrade options are *either* touse *only* binary packages, OR to use binary packages *if* they werebuilt with the correct USE flags. So for example I haven't bothered tosplit out my packages directory to be specific to the nginx/apachemachines, however, this causes the PHP package to be regularly rebuiltdepending on whether it was last used to upgrade an nginx or apacheguest (different use flags needed for each guest). I could fix thiseasily enough, but it's not a problem for me and it's automaticallyhandled through the portage binary package updates

So the end result is that you can make efficient use of binary updates,but portage will still customise the odd package here or there where alocal machine requires something which differs from the norm. To my eyethis keeps most of the benefits of an RPM/DEB style binary updater, withthe flexibility of a per machine, customised USE flag gentoo installation?

An setup for 100 equal webserver vm's could look like this:

* run a normal Gentoo vm (tailored for the webserver appliance),
   where do you do regular updates (emerge, revdep-rebuild, etc, etc)
* from time to time take a snapshot, strip off the buildtime-only
   stuff (hmm, could turn out to be a bit tricky ;-o)
* this stripped snapshot now goes into testing vm's
* when approved, the individual production vm's are switched over
   to the new image (maybe using some mount magic, unionfs, etc)

This could work and perhaps for 100 identical Vms you have enough meatto work on something quite customised anyway?

Personally for 20-80 identical VMs running very limited variety of websoftware I would go for:

- Slightly cut down gentoo VM
- Hardlinked across all instances OR single installation which is read only

- Writeable data areas mounted to their own space (/var/www, /tmp,/home, etc)

By separating the data from the OS you have a lot of flexibility toupgrade the base webserver install and mount the data back on the newVM? With linux-vservers or other container style, you will find thatthe OS shares code segments across all virtual machines (due to allfiles sharing the same inode) and the memory usage should be much lowerand nearer to firing up an instance of the shared app and it thenforking (ie data is duplicated, but the code segment is shared)

For 100+ Vms I guess I would be looking very strongly at a commonread-only OS partition and container style virtualisation

For 20-80 near identical VMs, but running a wider variety of websoftware I would go for the hardlinked option with a straightforward"emerge" upgrade option across them. Hardlinking keeps the memory usagesane where possible, without the pain of trying to keep the base installabsolutely identical and read-only to make the common mount option work?

At this point I've got a question for to the other folks here:

emerge has an --root option which allows to (un)merge in a separate
system image. So it should be possible to unmerge a lot of system
packages which are just required for updating/building (even
portage itself), but this still will be manual - what about
dependency handling ?

This is correct. In fact this is how you build a stage 1,2,3 etc andhow catalyst works!

The information is a bit spread out over several out of date wikiarticles, but perhaps start with:

    http://en.gentoo-wiki.com/wiki/Tiny_Gentoo

Roughly speaking you could "freshen" your current installation with(from memory):

    ROOT="/tmp/new_build" emerge -av world

This has minor gremlins when I test it, probably due to some symlinksbeing created differently if you follow the current catalyst buildscript through stage 1,2,3 etc, but roughly speaking it does the samething only jumping straight to the end result and building a completelynew identical install to your current OS...

Even more special is that you can set an alternative portage source, soif you want to build your new ROOT with alternative make.conf,/etc/portage/*, etc then just put your new files somewhere and setPORTAGE_CONFIGROOT to point to it. Cross compiling is also done throughan extension of this basic method

So, following your chain of thought - yes it's not too hard to quicklygenerate a customised base OS installation to use for your future VMs.Further, if you wish you can make those VMs have a reduced or missingtoolchain etc. In fact if you google a bit I think you will find somerecipes for very minimal VMs using this method where the base VM is avery minimal install...

Is there some way to drop at least parts of the standard system set,
so eg. portage, python, gcc, etc, etc get unmerged by --depclean
if nobody else (in world set) doesn't explicitly require them ?


You are almost thinking about it all wrong.  ("There is no spoon...")

This is gentoo, so at this more advanced level, stop thinking about"standard system set" and instead free your mind to start with"nothing". Go read the old bootstrap from stage 1 instructions, plusthe TinyGentoo pages and you can quickly see that Catalyst builds yourworking installation by starting from a working installation, creatingan empty directory, adding some minimal packages to that directory andbuilding up from there.

So absolutely nothing stops you from just starting with an emptydirectory and just emerging a few basic packages into it (couple MB) andthen chrooting into it and having some fun... There is *no* minimalpackage set, you can install whatever you want (as long as it boots).Largely the portage dependency tracker will help you pull in the minimalneeded dependencies, but beware that system packages arent generallyexplicitly tracked so you may stumble across some deps when you aregoing really basic and omiting standard system packages (just use commonsense: it should be fairly obvious if an application requires a compilerand you didn't install one then you have a conflict of interest...)

Have another look at gentoo! I definitely believe that it's flexibilityto build you highly customised packages, plus strong templating of thosepackages, plus decent ability to distribute binaries of the end resultis a very strong combo! Better binary support is really the only thingmissing here, but it's pretty adequate as it stands!


Good luck

Ed W

Re: [gentoo-dev] avoiding urgent stabilizations

Reply via email to