Hi,

+1 - love the idea, def joining.

However, I suspect the op was aiming at a publicly hosted binpkg
server.  Given the below it becomes one server/host, we care only about
the build parameters, and I suspect profile then has less of an effect
on the built packages.

For publicly *available* infrastructure I do however not recommend that
arbitrary people be able to upload.  We can however from attempted
download logs try to determine which combinations of USE flags are
required (with hashes of stuff this becomes tricky as even only 16 USE
flags are already 65536 potential hashes, and 20 clocks in at over 16
million, still perfectly reversable, but once we start getting to 30+
USE flags ...).  So perhaps a way to feedback "Hey, I looked for this
combo/hash" so we don't need to reverse hashes.

Would definitely join such a project, and it would be greatly beneficial
if we can improve the infra to have a form of distributed build ... ie,
for private infrastucture, improve the mechanisms used to "check for
binpkg availability, i not available, build it and submit back to binary
host", obviously all such nodes would need to be considered "trusted".

Kind Regards,
Jaco

On 2021/02/10 21:11, Rich Freeman wrote:
> On Wed, Feb 10, 2021 at 12:57 PM Andreas K. Hüttel <dilfri...@gentoo.org> 
> wrote:
>> * what portage features are still needed or need improvements (e.g. binpkg
>> signing and verification)
>> * how should hosting look like
> Some ideas for portage enhancements:
>
> 1.  Ability to fetch binary packages from some kind of repo.
> 2.  Ability to have multiple binary packages co-exist in a repo (local
> or remote) with different build attributes (arch, USE, CFLAGS,
> DEPENDS, whatever).
> 3.  Ability to pick the most appropriate binary packages to use based
> on user preferences (with a mix of hard and soft preferences).
>
> One idea I've had around how #2-3 might be implemented is:
> 1.  Binary packages already contain data on how they were built (USE
> flags, dependencies, etc).  Place this in a file using a deterministic
> sorting/etc order so that two builds with the same settings will have
> the same results.
> 2.  Generate a hash of the file contents - this can go in the filename
> so that the file can co-exist with other files, and be located
> assuming you have a full matching set of metadata.
> 3.  Start dropping attributes from the file based on a list of
> priorities and generate additional hashes.  Create symlinked files to
> the original file using these hashes (overwriting or not existing
> symlinks based on policy).  This allows the binary package to be found
> using either an exact set of attributes or a subset of higher-priority
> attributes.  This is analogous to shared object symlinking.
> 4.  The package manager will look for a binary package first using the
> user's full config, and then by dropping optional elements of the
> config (so maybe it does the search without CFLAGs, then without USE
> flags).  Eventually it aborts based on user prefs (maybe the user only
> wants an exact match, or is willing to accept alternate CFLAGs but not
> USE flags, or maybe anything for the arch is selected.
> 5.  As always the final selected binary package still gets evaluated
> like any other binary package to ensure it is usable.
>
> Such a system can identify whether a potentially usable file exists
> using only filename, cutting down on fetching.  In the interests of
> avoiding useless fetches we would only carry step 3 reasonably far -
> packages would have to match based on architecture and any dynamic
> linking requirements.  So we wouldn't generate hashes that didn't
> include at least those minimums, and the package manager wouldn't
> search for them.
>
> Obviously you could do more (if you have 5 combinations of use flags,
> look for the set that matches most closely).  That couldn't be done
> using hashes alone in an efficient way.  You could have a small
> manifest file alongside the binary package that could be fetched
> separately if the package manager wants to narrow things down and
> fetch a few of those to narrow it down further.
>
> Or you could skip the hash searching and just fetch all the manifests
> for a particular package/arch and just search all of those, but that
> is more data to transfer just to do a query.  A metadata cache of some
> kind of might be another solution.  Content hashes would probably
> still be useful just to allow co-existence of alternate builds.
>


Reply via email to