Re: [gentoo-user] almost free launch: an idea to lower build time, and rice, at the same time

Mickaël Bucas Tue, 05 Nov 2019 07:06:50 -0800

Le mar. 5 nov. 2019 à 01:02, Caveman Al Toraboran
<toraboracave...@protonmail.com> a écrit :
>
>
> DISCLAIMER:  I am not claiming that this idea is new.  It is probably not new.
> -----------  Even though some of its details might be new for a Linux
>              distribution, it's all based on boring well-established bits of
>              known science.  But regardless of its newness, I think it's worth
>              sharing with the hope that it may re-kindle the fire in a nerd's
>              heart (or a group of nerds) so that they develop this for me (or
>              us).
>
>
>
> GOAL:
> -----
> Reduce compile time, rice (e.g. fancy USE, make.conf, etc), and yet not
> increase dev overhead.
>
>
> CURRENT SITUATION:
> ------------------
> If you use *-bin packages, you cannot rice, and must compile on your own.
>
>
> THE APPROACH:
> -------------
> 1. Some nerd (or a group of nerds) makes (or make) a package, maybe call it
>    `almostfreelunch.ebuild`.
>
> 2. Say you want to compile qtwebengine.  You do:   `almostfreelunch -aqvDuNt
>    --backbrack=1000 qtwebengine`.
>
> 3. The app, `almostfreelunch`, will lookup your build setup (e.g.  USE flags,
>    make.conf settings, etc) for all packages that you are about to build on
>    your system as you are about to install that qtwebengine.
>
> 4. The app will upload that info to a central server, which  looks up the
>    popularity of certain configurations.  E.g. see the distribution of
>    compile-time configurations for a given package.  The central server will
>    then figure out things like, qtwebengine is commonly compiled for x86-64
>    with certain USE flags and other settings in make.conf.
>
> 5. If the server figures out that the package that `almostfreelunch` is about
>    to compile is popular enough with the specific build settings that is about
>    to happen, the server will reply to the app and tell it "hi, upload to me
>    your bins when cooked, plz".  But if the build setting is not popular
>    enough, it will reply "nothx".  This way, the central server will not end 
> up
>    with too much undesired binaries with uncommon build-time settings.
>
> 6. The central server will also collect multiple binary packages from multiple
>    people who use `almostfreelunch` for the same packages and the same
>    build-time options.  I.e. multiple qtwebengine with identical build-time
>    settings (e.g.  same USE flags, make.conf, etc).
>
> 7. The central server will perform statistical analysis against all of the
>    uploaded binaries, of the same packages and the same claimed build-time
>    settings, to cross-check those binaries to obtain a statistical confidence
>    in identifying which of the binaries is the good one, and which ones are
>    outliers outlier.  Outliers might exist because of users with buggy
>    compilers, or malicious users that intentionally try to inject malware/bugs
>    into their binaries.
>
> 8. Thanks to information theory, we will be able to figure out how much
>    redundancy is needed in order to numerically calculate confidence value 
> that
>    shows how trusty a given binary is.  E.g. if a package, with specific
>    build-time options, as a very large number of binary submissions that are
>    also extremely similar (i.e. only differ in trivial aspects due to certain
>    randomness in how compilers work), then the central server can calculate a
>    high confidence value for it.  Else, the confidence value drops.
>
> 9. If a user invokes `almostfreelunch -aqvDuNt --backbrack=1000 qtwebengine`
>    and the central server tells the user that there is an already compiled
>    package with the same settings, then the server simply tells the user, and
>    shows him the confidence associated with the fitness of the binary (based 
> on
>    calculations in stepss (6) to (8)).  By default, bins with too-low
>    confidence values will be masked and proper colours will be used to
>    adequately scare the users from low-confidence packages.
>
> 10. If at step (9) the user likes the confidence of the pre-compiled binary
>    package, the user can simply download the binary package, blazing fast, 
> with
>    all the nice UES and make.conf flags that he has.  Else, the user is free 
> to
>    compile his own version, and upload his own binary, to help the server
>    enhance its confidence as calculated in steps (6) to (8).
>
>
> NOTES:
> ------
> * The statistical analysis in step (5) can also consider the compile time of
>   packages.  So the minimum popularity required for a specific package build 
> is
>   weighted while considering the total build time.  This way, too 
> slow-to-build
>   packages will end up getting a lower minimum popularity than those small
>   packages.  Choosing the sweet-spot trade-off is a matter of optimizing
>   resources of the central server.
>
> * The statistical analysis in steps (6) to (8) could also be further enhanced
>   by ranking individual users who upload the binaries.  Users, who upload 
> bins,
>   could optionally also sign their packages, and henceforth be identified by
>   the central server.  Eventually, statistics can be used to also calculate a
>   confidence measure on how trusty a user is.  This can eventually help the
>   server more accurately calculate the confidence of the uploaded bins, by 
> also
>   incorporating the past history of those users.
>
>   Sub-note 1:  The reason signing is optional, is because ---thanks to
>   information theory--- we don't really need signed packages in order to know
>   that a package is not an outlier.  I.e. even unsigned packages can help us
>   figure out the probability of error by simply looking at the redundancy
>   counts.
>
>   Sub-note 2:  But, of course, signing would help as it will allow the central
>   server's statistical analysis to also put into account which bin is coming
>   from which user.  E.g. not all users are equally trusty, and this can help
>   the system be more accurate in its prediction of the error on the package.
>
>   Sub-note 3:  I said it already, but just to repeat, when the error becomes
>   low enough, this distributed system can potentially end up producing 
> binaries
>   that match or exceed trusty Gentoo devs.  Adding common heuristic checks are
>   optional, but can make the bins even more likely to beat manual devs.
>
> * Eventually, this statistical approach could also replace the need for
>   manually electing binary package maintainers by a principled statistical
>   approach.  Thanks to the way stuff work in nature, this system has the
>   potential of being even more trusty than the trustier bin-packager 
> developer.
>
> * In the future, this could be extended to source-code ebuilds, too.
>   Ultimately, reaching a quality equal to, or exceeding that of, the current
>   manual system.  This may pave the path to a much more efficient operating
>   system where less manual labour is needed by the devs, so that more devs can
>   do actually more fun things than packaging boring stuff.
>
> * This system will get better the more people use it, and the better it gets
>   the more the people would like it and hence even more will use it!  It works
>   like turbo-charging.  Hence, if this succeeds, we may market Gentoo as the
>   first "turbo-charged OS"!
>
> * Based on step (5), the server can set frequency thresholds in order to keep
>   its resources only utilized by highly demanded packages.
>
>
> rgrds,
> cm


Hi Caveman

The Portage tree contains a few binary packages prepared by Gentoo
developers, like Firefox, Rust, LibreOffice...
"ls -d /usr/portage/*/*-bin" shows about 90 packages prepared in this
way, some of them because they are non-free like Oracle JDK

This means that there is no necessary changes to Gentoo to accomplish
what you describe : compile the packages, write the ebuilds for the
binary packages, publish ebuilds in an overlay.

But the really short list above shows that it's a really complex task
because of all dependencies and configurable elements in Gentoo. If
you just have a look at the output of "emerge --info" you can imagine
all the moving parts, like compiler versions and compile options,
Bash, Perl, Python, Init system, USE flags (combinatorial), even human
languages. And that is just the easily visible parts !

I remember reading an article about a man trying to reproduce binary
packages of a binary distribution and failing to do so, because there
are so many parts involved. I've read later that distributions have
done some work to have reproducible builds, but I'm not sure how
successful they are, even when all choices are predefined.

Given that Gentoo has taken a whole different road by having more
choices available to the user, I don't think the compilation results
of one configuration would be easily used on another.

To go even further, pushing your compiled packages to a public server
may create a security risk by exposing many parts of your
configuration that could be analyzed by malicious people.

So far I don't see a really big advantage in building this kind of
infrastructure compared to either a binary distribution or Gentoo with
home compilation.

Best regards

Mickaël Bucas

Re: [gentoo-user] almost free launch: an idea to lower build time, and rice, at the same time

Reply via email to