Hey Robin,

Sorry for the delay in getting back to you. As mentioned on IRC, both of
your messages bounced earlier, and I was at a conference all last week.
Catching up with this thread now...

On Wed, Apr 06, 2022 at 05:23:25PM +0000, Robin H. Johnson wrote:
> On Wed, Apr 06, 2022 at 02:15:02AM +0200, Jason A. Donenfeld wrote:
> > 2) Comparability: other distros use SHA2-512, as well as various
> > upstreams, which means we can compare our hashes to theirs easily.
> Can we expand on this specific thread for a moment?
> 
> I was the author of GLEP59 about changing the Manifest hashes, and I
> noted at the time, with references, that the effective strength of a set
> of hashes is only that of the strongest hash.
> 
> One of my regrets from GLEP59 is that it's made it harder for use cases
> outside of the normal user distfile workflow.
> 
> The use case that impacted me the most was being able to compare our
> distfiles were over time vs external sources, esp. if the file goes
> missing or was fetch-restricted and we can't produce a new hash of it.
> Maybe upstream only ever published SHA1/SHA256, and we only ever
> calculated SHA512/BLAKE2b on the file. Since we never had hashes from
> both sides at the same time, we cannot prove it was the same file.
> 
> We need to be able to ship one or more hashes to users, for the specific
> use case of validating the distfiles they download.
> 
> As a developer, I'd like to be able to track the other hashes for a
> file, without forcing ourselves to retain the file. This might be to
> compare with upstream published hashes, or to compare with other
> distros.
> 
> In fact it would be really nice to have a semi-automated pipeline to
> plug in signed upstream hashes to our Manifests, and make it possibly to
> prove our new SHA512/BLAKE2B hash was taken over the correct input in
> the first place, and there wasn't any subtle supply-chain attack early
> in the packaging process.
> 
> Where would those hashes go? They don't need to be in the Manifest, or
> at the very least they don't need to be distributed via rsync to users
> (it only costs a small amount of bytes to do so).
> 
> Where else could they go? 
> - Commit messages could work.
> - Git notes to a lesser degree.
> - alternate repos?

Interesting idea. This seems orthogonal to my proposal ("just use one
hash in the manifest and call it a day; make it the same as what gpg
uses for signing to minimize moving pieces"), and so I'm hesitant to
indulge too much in this thread, for fear of it being derailed with this
different thing you want.

With that said, I'm not quite sure I understood everything you're asking
for. You said that you want "to have a semi-automated pipeline to plug
in signed upstream hashes to our Manifests, and make it possibly to
prove our new SHA512/BLAKE2B hash was taken over the correct input", but
at the same time you also said that you want "to be able to track the
other hashes for a file, without forcing ourselves to retain the file."
What I'm wondering is: how do you propose that we calculate a SHA-512
hash of a file and "prove it correct" using, e.g., a signed SHA-256
hash, if we don't download the whole file?

It sounds like the thing that would be interesting to you would be for
infra to manage some sort of master hash database collecting all the
hashes from all over the internet of every file that hits distfiles,
verifying and then generating a bunch more hash variants of all kinds,
and then cross-verifying those with the hashes extracted from every
other distro, making for a wild hash verification aggregator machine. I
think I can see the utility of it. It would also unburden manifest
files, as those could then just have a SHA-512 hash and nothing else,
making things a bit lighter.


> > A reason why some people might prefer BLAKE2b over SHA2-512 is a
> > performance improvement. However, seeing as right now we're opening
> > the file, reading it, computing BLAKE2b, closing the file, opening the
> > file again, reading it again, computing SHA2-512, closing the file, I
> > don't think performance is actually something people care about. Seen
> > differently, removing either one of them will already give us a
> > performance "boost" or sorts.
> Or just only verifying the "strongest" hash gives you that boost.
> 
> I do want to check into the code that you pointed out, because I'm
> really sure much older versions of Portage did the CORRECT thing of only
> reading the file in a single pass.

Let me know if your findings are different from mine...

Jason

Reply via email to