Re: [gentoo-dev] About EGO_SUM

2022-06-09 Thread John Helmert III
On Thu, Jun 09, 2022 at 07:49:04PM +0200, Sebastian Pipping wrote:
> On 08.06.22 22:42, Robin H. Johnson wrote:
> > EGO_SUM vs dependency tarballs:
> > [..]
> > - EGO_SUM is verifiable/reproducible from Upstream Go systems
> 
> Let's be explicit, there is a _security_ threat here: as a user of an
> ebuild, dependency tarballs now take effort in manual review just to
> confirm that the content full matches its supposed list of ingredients.
> They are the perfect place to hide malicious code in plain sight.  Now
> with dependency tarballs, there is a new layer that by design will
> likely be chronically under-audited.  It gives me shivers, frankly.
> Previously with a manifest and upstream-only URLs, only upstream can add
> malicious code, not downstream in Gentoo.

There are many packages in ::gentoo that use tarballs of patches
written and hosted by Gentoo developers, or tarballs of source code
generated by developers themselves. A (very) rough grep shows this is
very prevalent:

~/gentoo/gentoo $ grep -r SRC_URI.*dev.gentoo.org | wc -l
2845

So this problem isn't really new. Users are required to trust Gentoo
packagers that we don't do naughty things to the source code, more or
less just like any other distribution.

signature.asc
Description: PGP signature


Re: [gentoo-dev] About EGO_SUM

2022-06-09 Thread Sebastian Pipping

On 08.06.22 22:42, Robin H. Johnson wrote:

EGO_SUM vs dependency tarballs:
[..]
- EGO_SUM is verifiable/reproducible from Upstream Go systems


Let's be explicit, there is a _security_ threat here: as a user of an
ebuild, dependency tarballs now take effort in manual review just to
confirm that the content full matches its supposed list of ingredients.
They are the perfect place to hide malicious code in plain sight.  Now
with dependency tarballs, there is a new layer that by design will
likely be chronically under-audited.  It gives me shivers, frankly.
Previously with a manifest and upstream-only URLs, only upstream can add
malicious code, not downstream in Gentoo.

Best



Sebastian



Re: [gentoo-dev] About EGO_SUM

2022-06-08 Thread Robin H. Johnson
On Fri, Jun 03, 2022 at 01:18:08PM +0200, Florian Schmaus wrote:
> EGO_SUM is marked as 'deprecated' in go-module.eclass [1, 2]. I 
> acknowledge that there are packages where the usage of EGO_SUM is very 
> problematic. However, I wonder if there are packages where using 
> dependency tarballs is problematic while using EGO_SUM would be not.
... [snip all the great points]
> Even more problematic are that dependency tarballs require additional 
> steps that would not be required when EGO_SUM is used. While those steps 
> appear simple, behavioral theory shows that even the tiniest additional 
> steps have a huge impact (e.g., online shops loose a relative large 
> share of customers if for each an additional checkout step). If we force 
> dependency tarballs for Go software, then packaging Go software just 
> become a little bit harder.
Your above is entirely correct, and I was against the plan to introduce
dependency tarballs.

> This leads me to the question why are we actually deprecating EGO_SUM? 
> It seems like a nice alternative for Go packaging that we may want to 
> keep. But maybe I am missing something?
EGO_SUM vs dependency tarballs:
- bloats ebuilds
- bloats Manifests
- bloats metadata/md5-cache/ (SRC_URI etc)
- doesn't bloat mirrors with gentoo-unique distfiles
- EGO_SUM is verifiable/reproducible from Upstream Go systems
- less downloads on upgrades (only changed Go deps, not entire dep tarballs)

EGO_SUM data right now adds, to every user's system:
- 2.6MB of text to ebuilds (340k after de-dupe)
- 7MB of text to Manifests (2M after de-dupe)
- 6.4MB+ of text to metadata/md5-cache (I don't have a easy way to calc deduped 
amount here)
On the server side:
- The sum total of Go distfiles mirrored on Gentoo mirrors right now is only 
3.4GB.
- less downloads

Dependency tarballs:
- Right now ~15GiB on each mirror, plus storage of the primary copy
  somewhere (dev.g.o right now, but not great)
- Conservatively if the remaining EGO_SUM packages converted to Dep
  tarballs, it would need another 8GB each of primary location and
  mirrors.
- larger downloads for users who DO want to upgrade a Go package (all
  new deps tarball even if only one or two deps changed)
- must be preserved much longer, unless we can introduce a guaranteed
  way to regenerate them for any prior ebuild.

I was trying to introduce a third option, but I haven't had the time to
write an entire GLEP.

The TL;DR is introducing a 2nd-level Manifest+metadata file, that tries
to move just the metadata out of the tree, in a way that can be
regenerated (specifically, a 1:1 reproducible creation from a given go.sum).
It DOES need to contain slightly more data than the present Manifest,
specifically a full SRC_URI entry for each file (upstream URI plus what
to rename it to on Gentoo side)

The 2nd-level Manifest would be listed as SRC_URI, and be handled in
src_fetch/src_unpack. Download & verify the extra distfiles, against the
Manifest checksum data (and for Golang against go.sum checksums).

The Portage mirrordist code needs the most work in this case, as it
would need to fetch the 2nd-level Manifests so it can populate Gentoo
mastermirror with the distfiles mirrored from upstream.

The storage costs for the proposed idea:
- same 1:1 base distfile storage as EGO_SUM (e.g. upstream distfiles are
  mirrored 1:1 content, just different naming)
- Probably 1 Metadata-Manifest file per ebuild $PVR (conceptually it
  could be split more or shared between some ebuilds/packages)
- Main tree Manifests: 1 DIST entry per Metadata-Manifest in a given package
- Main tree ebuilds: 1 line for the Metadata-Manifest in the ebuild.
- metadata/md5-cache: 1 src_uri line!
- mirrors: add the Metadata-Manifest

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature


Re: [gentoo-dev] About EGO_SUM

2022-06-03 Thread Ionen Wolkens
On Fri, Jun 03, 2022 at 01:18:08PM +0200, Florian Schmaus wrote:
> EGO_SUM is marked as 'deprecated' in go-module.eclass [1, 2]. I 
> acknowledge that there are packages where the usage of EGO_SUM is very 
> problematic. However, I wonder if there are packages where using 
> dependency tarballs is problematic while using EGO_SUM would be not.
> 
> Take for example an ebuild containing
> 
> SRC_URI="
>  
> https://salsa.debian.org/baz/${PN}/-/archive/v${PV}/${PN}-v${PV}.tar.bz2 
> -> ${P}.tar.bz2
>  https://personal.site/files/gentoo/${P}-vendor.tar.xz
> "
> 
> where ${P}-vendor.tar.xz is a Go dependency tarball, containing only a 
> few Go modules. Hence EGO_SUM would contain only a few entries in this case.
> 
> I see multiple issues of using dependency tarballs in such cases.
> 
> First, my trust in a tarball created by someone and hosted somewhere is 
> lower than the contents of the artifacts hosted on an official hub. 
> Next, if anyone takes the time to review the contents of the dependency 
> tarball, it may only benefit Gentoo. On the other hand, if someone 
> reviews EGO_SUM artifacts, the whole Go ecosystem will benefit.

I do wonder what degree of verification is being done when these get
merged at the moment, ideally upstream go.sum would be used at build
time but well (I can go around and change code in the vendor tarball
and it builds just fine at the moment).
https://github.com/golang/go/issues/27348

If I start merging these guess I'd end up making myself a script to
make my own tarball and compare it's identical with the proxied
maintainer's.

> 
> I may not know Gentoo's mirror system in detail, but I believe using 
> EGO_SUM facilitates cross-package distfile sharing. While dependency 
> tarballs will increase the space requirements, and, probably more 
> importantly, the load on the mirrors.
> 
> Even more problematic are that dependency tarballs require additional 
> steps that would not be required when EGO_SUM is used. While those steps 
> appear simple, behavioral theory shows that even the tiniest additional 
> steps have a huge impact (e.g., online shops loose a relative large 
> share of customers if for each an additional checkout step). If we force 
> dependency tarballs for Go software, then packaging Go software just 
> become a little bit harder.
> 
> This leads me to the question why are we actually deprecating EGO_SUM? 
> It seems like a nice alternative for Go packaging that we may want to 
> keep. But maybe I am missing something?

Missed bits and pieces but was never quite sure why this went toward
full deprecation, just discouraged may have been fair enough, or
(maybe?) impose a limit at which the eclass will tell you to use a
vendor tarball so this doesn't get constantly ignored bringing us
back to square 1.

Not that I work with Go packages so I don't have much to say here.
fwiw there is one rust ebuild which I'm thinking to use a vendor
tarball due to ridiculous crates, while there is e.g. media-libs/cubeb
with only 12. So I'm happy I can choose (not that rust is as bad
as Go in that regard).
 
> 
> 1: 
> https://github.com/gentoo/gentoo/blob/9fec686abf789fdff36a90c3763d9558203cbf9a/eclass/go-module.eclass#L108
> 2: 
> https://github.com/gentoo/gentoo/blob/9fec686abf789fdff36a90c3763d9558203cbf9a/eclass/go-module.eclass#L349-L352
> 

-- 
ionen


signature.asc
Description: PGP signature


[gentoo-dev] About EGO_SUM

2022-06-03 Thread Florian Schmaus
EGO_SUM is marked as 'deprecated' in go-module.eclass [1, 2]. I 
acknowledge that there are packages where the usage of EGO_SUM is very 
problematic. However, I wonder if there are packages where using 
dependency tarballs is problematic while using EGO_SUM would be not.


Take for example an ebuild containing

SRC_URI="

https://salsa.debian.org/baz/${PN}/-/archive/v${PV}/${PN}-v${PV}.tar.bz2 
-> ${P}.tar.bz2

https://personal.site/files/gentoo/${P}-vendor.tar.xz
"

where ${P}-vendor.tar.xz is a Go dependency tarball, containing only a 
few Go modules. Hence EGO_SUM would contain only a few entries in this case.


I see multiple issues of using dependency tarballs in such cases.

First, my trust in a tarball created by someone and hosted somewhere is 
lower than the contents of the artifacts hosted on an official hub. 
Next, if anyone takes the time to review the contents of the dependency 
tarball, it may only benefit Gentoo. On the other hand, if someone 
reviews EGO_SUM artifacts, the whole Go ecosystem will benefit.


I may not know Gentoo's mirror system in detail, but I believe using 
EGO_SUM facilitates cross-package distfile sharing. While dependency 
tarballs will increase the space requirements, and, probably more 
importantly, the load on the mirrors.


Even more problematic are that dependency tarballs require additional 
steps that would not be required when EGO_SUM is used. While those steps 
appear simple, behavioral theory shows that even the tiniest additional 
steps have a huge impact (e.g., online shops loose a relative large 
share of customers if for each an additional checkout step). If we force 
dependency tarballs for Go software, then packaging Go software just 
become a little bit harder.


This leads me to the question why are we actually deprecating EGO_SUM? 
It seems like a nice alternative for Go packaging that we may want to 
keep. But maybe I am missing something?


- Flow


1: 
https://github.com/gentoo/gentoo/blob/9fec686abf789fdff36a90c3763d9558203cbf9a/eclass/go-module.eclass#L108
2: 
https://github.com/gentoo/gentoo/blob/9fec686abf789fdff36a90c3763d9558203cbf9a/eclass/go-module.eclass#L349-L352