On raw strings in commit field

Liliana Marie Prikler Tue, 28 Dec 2021 12:55:38 -0800

Hi Guix,

when Ricardo recently added guile-aiscm to Guix, I was confused that
both the version field of the package and the commit field of the git-
reference used in its origin.  It turns out, that this is a rare
pattern observed in less than 200 packages currently in Guix.   The
reason to do so (as far as I understand and was explained to me in IRC)
is that commit tags are in principle mutable and hence can not be
relied on when fetching sources.  I do have a few issues with that
explanation, but before that let's go a step back and discuss the
relation of version and commit.


Consider a package being added or updated in Guix.  At the time of
commit, we have the tag v1.2.3 pointing towards commit deadbeef.  We
therefore create a guix package with version "1.2.3" pointing to said
commit (either directly or indirectly).  At this point, one of the
following holds:
  (1) Guix "1.2.3" -> upstream "v1.2.3" -> upstream "deadbeef"
  (2) Guix "1.2.3" -> upstream "deadbeef" <- upstream "v1.2.3"
>From either, we can follow that Guix "1.2.3" = upstream "v1.2.3".  If
upstream keeps their tags around, then both forms are equivalent, but
(1) is more convenient; it allows us to derive commit from version,
which is often done through an affine mapping.

Problems arise, when upstreams move or delete tags.  At this point,
guix packages that use them break and are no longer able to fetch their
source code.  Raw commits are in principle resilient to this kind of
denial of service; instead upstreams would have to actually delete the
commits themselves, including also possible backups such as SWH to
break it.  There is certainly an argument for robustness to be made
here, particularly concerning `guix time-machine', though as noted it
is not infallible.  

It should be noted, that in the case of moving or deleted tags, the
assertion Guix "1.2.3" = upstream "v1.2.3" no longer holds.  Widespread
use of this pattern under the above reasoning would imply that those
upstreams can't be trusted to have stable tags when there are probably
few offenders in that category (considering also that Guix is not the
only tool they'd break if they do move or delete tags).  More
importantly, if we do have a non-trustworthy upstream, it could be
reasoned that referring to some tag is as good as referring to a random
commit and thereby let-bound commits and revisions ought to be used.

As any good Sith would, the above talks in absolutes, or at the very
least uses default logic without considerable fallbacks.  On the note
of fallbacks, we do also have the issue that Guix fails on the first
download that does not match the hash instead of e.g. continuing to SWH
to fetch an archive of the old tag (as well as other fallback-related
issues, also including the "Tricking Peer Review" thread).  Putting
those aside for a while, there is an all but endless amount of
upstreams for which we can't tell ahead of time whether they will act
nicely or not.  The status quo for most of our packages is to assume
that they do and fail loudly if they don't.  The proposed alternative
is to assume they don't and miss out on nice things if they do. 
However, even under that assumption we also miss out on ninja version
bumps and the only way of noticing other than paranoid amounts of
checking whether the tag moved would be to wait for a mail from
upstream claiming that they actually wanted us to notice the ninja
bump.

Neither of the above is really satisfactory.  At the very least, if raw
strings are to be used in the commit fields for tags that "once
existed, but maybe no longer point to that commit", I'd want a comment
like the ones I find in minetest.scm to mentally prepare me for what
I'm about to read in the rest of the package description, but I'd much
prefer using let-bound commit/revision pairs.  Perhaps we could make
revision "0" (alternatively #f if we don't want current versions to
break) special in that a git-version with it expands to just version.  

Long-term, we might want to support having multiple <git-references> in
git-fetch -- if the first one fails due to a hash mismatch, we would
warn about that instead of producing an error and thereafter continue
with the second, third, etc. similar to how we currently have mirror://
urls for some well-known mirrored repositories.  That way, we have a
system to warn us about naughty upstreams while also providing
robustness for the time machine.

What do y'all think?

On raw strings in commit field

Reply via email to