Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen:
> On Tue, Jul 04, 2023 at 21:56:26 +0000, Robin H. Johnson wrote:
> > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote:
> > > just to be curious about the whole discussion. I did not follow in the
> > > deepest detail but what I got is:
> > > - EGO_SUM blows up the Manifest file, since every little Go module needs
> > >   to be respected. A lot of these Manifest files lead to a extremely
> > >   increased Portage tree size. EGO_SUM is just one example (though the
> > >   biggest one). Statically linked languages like Rust etc. have the same
> > >   problem.
> > > - The current solution is to prepackage all modules, put it somewhere on
> > >   a webserver and just manifest that file. This make the Portage tree
> > >   small in size again, but requires a webserver/mirror and is thus
> > >   unfriendly for overlay devs.
> > > 
> > > I'm not sure if it was mentioned before but has anyone considered hash
> > > trees / Merkle trees for the manifest file? The idea would be to hash
> > > the standard manifest file a second time if it gets too big and write
> > > down that hash as new manifest file and leave EGO_SUM as is.
> > This is out-of-tree/indirect Manifests, that I proposed here, more than
> > a year ago:
> > https://marc.info/?l=gentoo-dev&m=168280762310716&w=2
> > https://marc.info/?l=gentoo-dev&m=165472088822215&w=2
> > 
> > Developing it requires PMS work in addition to package manager
> > development, because it introduces phases.
> > 
> > - primary fetch of $SRC_URI per ebuild, including indirect Manifest
> > - primary validation of distfiles
> > - secondary fetch of $SRC_URI per indirect Manifest
> > - secondary validation of additional distfiles
> > 
> > A significantly impacted use case is "emerge -f", it now needs to run
> > downloads twice.
> > 
> 
> I'm not sure double downloading is required. Consider a flow similar to
> this:
> 
> 1. distfiles are fetched as per the ebuild
> 2. distfiles are hashed into a temporary Manifest
> 3. temporary Manifest is hashed and compared with the hashes stored in
>    the in-tree Manifest for the direct Manifest

This is exactly, what I meant. A webstorage is not needed. A second
download process is also not needed. Just an additional Manifest format
is needed for ebuilds with more than n distfiles.


> A new Manifest format would be required in order to differentiate the
> current ones from an indirect one. This may require PMS changes,
> although I suspect ammending GLEP 74 may be enough since the PMS seems
> to just refer to the GLEP for a description of Manifests.
> 
> This would also either rely on a stable ordering of Manifest contents
> when generating it or having a separate file listing in the indirect
> Manifest which corresponds to the order in the direct Manifest. For the
> latter, it should also have separate entries for different package
> versions so that every single distfile for every single version of said
> package does not need to be fetched in order to build the direct
> Manifest.
> 
> I'm imagining something along these lines:
>     
>     INDIRECT true
>     PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 
> ALGO2 hash2 ...
>     PACKAGE ...

Maybe it is reasonable to skip the distfile names at all (or just
provide a hash value of the concatenated file names). Then the manifest
would just contain two/three hashes (for as many distfiles as the ebuild
needs). Since these kind of indirect Manifests should be more rare than
the normal ones, a slightly longer processing time does not have much
impact I would say.



> Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest
> containing the distfiles (and potentially other files if a repo does not
> have thin-manifests enabled) and their hashes in the order specified
> previously.
> 
> The indirect Manifest as described above would be large-ish for a
> package that has lots of distfiles, but likely much smaller than if each
> distfile had its set of hashes stored directly.

Without storing the filenames, the Manifest file would have the same
small size for any amount of distfiles needed.

Gerion


> Please correct me if there's some detail I've overlooked.
> 
> - Oskari
> 
> > The rest of the posts also go into the matter of duplication within
> > EGO_SUM & the indirect Manifests: limiting the growth requires some form
> > of content-addressed layout.
> > 
> > It's absolutely something we should get developed, but it's a lot of
> > work.
> > 
> > The indirect Manifests still provide a hosting challenge for overlays.
> > 
> 
> 
> 

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to