Good morning,

I want to talk about improved binary package support for Gentoo. About
1-2 months ago there already was a discussion about this on gentoo-soc@
and on bugzilla [1]. If I remember correctly, there were no devs
involved in the discussion, so I thought I'll post my thoughts here.

I know, that Gentoo is a source-based distribution or meta-distribution,
and I don't want to make Gentoo another Fedora or Ubuntu, but I think
there are some things we can learn from them.

The current situation:

Binary packages are (usually) stored
in /usr/portage/packages/$category/$package-$version.tbz2. The package
consists of the "real binary package" and the metadata (combined using
xpak or whatever).

Problems I see with this:

1) If a binary package is built because it needs to be linked against a
new library, because the USE-flags change or because the ebuild changes
without a revision bump, the "old" binary package is overwritten. This
also means that there is no support to store multiple packages with
different USE-flags without, well, using different directories.
2) To find out which USE-flags a package is built with, one needs to
download the package and look at the metadata. Today I discoveres a file
called "Packages" which looks like a metadata cache, but I did not find
more information about it (only tried "man portage").

So, how can we address this?

First we should do something about 2), I think: I want to propost the
following scheme:

Binary packages are stored in
$arch/$description/$category/$package/$package-$version-$ev-$use-$bv.tbz2.

$arch: This is x86, ppc or whatever you put into ACCEPT_KEYWORDS minus
 the '~'. It does not make sense to make a distinction here.
$description: Something like pentium3, core2quad, G4, or whatever.
 Pentium3-uclibc, Pentium3-solaris-prefix are also possible.
$category, $package and $version should be clear.
$ev: The "ebuild version". See below.
$bv: The "binary version". See below.
$use: The USE-flags. See below.

About ebuild version, USE-flags and binary version:

I would like to encode the USE-flags into the filename. This enables us
to have binary packages of the same version built with different
USE-flags in the same repository. Some wanted to have this in the
directory, some say it is ok to have it in the xpak only and some prefer
the "Packages"-like file.

I think, USE-flags can be set per package and therefore should be stored
per package, not per $description or whatever. Having it only in the
xpak allows no distinction between multiple binary packages, same
version, differen USE-flags and the same is true for the Packages file.
This would also be created, downloaded all the time and so on. Therefore
I think the cleanest solution is having USE-flags in the filename.

There are different methods to store it there.

a) A checksum (of the USE-flags, the USE-flag string, the ebuild and the
USE-flag string, whatever).
b) List the enabled USE-flags in the filename, use a) if the string gets
too long.
c) Use a packed binary vector.

I don't like a), because it is not easily reversible. You could always
download the Packages file or the binary package and look into the xpak
metadata, but that's too much effort. b) also has the problems i
mentioned for a). Also, you'd need some system to distinguish ebuilds
with the same version but different USE-flags. You also need that for
c), so b) has no advantages ofer c) in my eyes.

For c) I think of the following: Sort the USE-flags in some defined way
(ASCII code, whatever) and make a vector with a 1 for every enabled
USE-flag and a 0 for every disabled USE-flag. Compress that vector: If
you use HEX code, you need 1 character for every 4 bits, but it should
be possible to find 64 different characters, then you need 1 character
for every 6 bits. PHP has 106 USE.flags, that would make a USE-string
with about 18-27 characters. Packages with lots of USE-expand stuff like
languages would need more, but not too much, I think.

Problems: The string might get long, you get big problems with USE-flag
renames, USE-flag additions or removals. That's where the ebuild version
is needed. Or not. We have 3 possibilities:

a) Change policy: USE-flag changes in an ebuild need a version bump. 
b) Use a checksum of the ebuild.
c) Use the version given by the version control system.

The problem with a) is, that is a change in policy and probably hard to
do. Increasing the revision for a (trivial) change leads to a lot of
unnecessary rebuilds for users. It also means, that USE-flag changes in
eclasses are difficult, the eclass should probably copied over to a new
name with version and only ebuilds with a new version (revision) are
allowed to use it. 

The problem with b) is, that it is not ordered. You don't know, which is
the newest version. If you have an ebuild with a version where there is
no binary package for, it gets difficult/ugly.

c) also has problems: When using cvs, there are versions easily
available. The same is true for svn, but lots of distributed version
control systems like git use checksums as versions. Welcome back to b).
Another thing is, how do we get to the versions? Will they be in the
header forever, since they make signing ebuilds or the manifest much
more complicated (multiple commits necessary)? But, well, since metadata
is generated and provided by "the tree", it should be not too hard to ad
a unique ebuild version there (in the case of checksums, use an integer,
increase whenever the checksum changed or something). It just might make
using overlays a bit more difficult.

The last thing to be discribed is the binary version. Lots of people
talk about dependencies to other binary packages when they talk about
binary packages for Gentoo, but that gets quite difficult (and, in my
opinion, ugly). We mostly need to provide a "consistent set" of
packages, which means, if A depends on B, B changes and therefore breaks
A, we need to provide an updated version of A. And we can do that with
simply increasing the binary version, since the package manager knows
then, that this package needs updating, too.

How to create binary packages?

Create some build server (or build server infrastructure). The most
important thing is a script or something that provides the
functionality. One enters a make.conf, /etc/portage dir, path to the
profile, description and whatever else is needed and the system starty
building. Then you can create a second set of data and start building
and the system puts the binary packages in the same directory and
discovers what needs to be built and what not (because apache needs to
be built only once if its USE-flags are the same for the different
configuration sets).

But there are thousands of packages and millions of USE-flag
combinations!

Seriously, who cares? The goal of this project (as it exists in my head)
is not to provide everything. It is to provide the most used packages.
If you need parrot, compile it yourself. If you need netbeans, compile
it yourself. We have @system, gnome, kde and anothe hand full of
packages, which will change over time. I'm, really lookign forward to
the data collected by the statistics project (GSoC).

The same is true for USE-flags: We might provide gnome, kde, both, a
server profile and whatever we decide to provide, but not everthing.
Again, statistics will help.

Same with CFLAGS. Probably no -O3, no -ffast-math, no -break-my-code or
whatever. Probably x86 with 32 and 64 bit for the beginning, later maybe
more.

So, the really really cool thing is, that if you are some company,
university, institution or freak, with lots of (similar) Gentoo boxes,
you can set up a build server and even share the binary packages, if you
want. Same level of security as non-official overlays, but in the
university of FooBar in Jamaica uses it, there should not be too many
security problems.

Thanks for reading, please discuss, I probably forgot lots of stuff, but
I can tell it later in the discussion.

Philipp




[1] https://bugs.gentoo.org/show_bug.cgi?id=150031


Reply via email to