Re: On raw strings in commit field

zimoun Fri, 31 Dec 2021 09:25:11 -0800

Hi,

On Fri, 31 Dec 2021 at 16:19, Liliana Marie Prikler <liliana.prik...@gmail.com> 
wrote:


> You're also missing the part in which it currently relies on a single
> server to do all this, but there are plans to move it out to multiple
> ones, i.e. adding fallbacks/redundancy to your fallback mechanism,
> which for the record is a good idea to have.

I do not see why you guess I am missing a part.  Anyway.

Redundancy adds one kind of robustness: resilience.  Obviously it helps.
For sure, I want that too because it is the straightforward, an “easy“
and “quick” way to have robustness.  However this assumes all the
redundant nodes of the web of nets will be still up, at least enough to
have this…  robustness.  Me too, I hope Guix will be popular and all
redundancies still running when I will be old or dead.  But I will not
bet on that assumption.

What Timothy is doing with Preservation of Guix and a window of ~2years
shows that any web of nets is really fragile.  I do not see why the one
we are building around Guix will be different.

Instead of trying to have robustness by adding more and more, from my
point of view, it appears to me the occasion to rethink and try to have
robustness with less.

I agree with you that various fallbacks is one good direction to go.
SWH is one thing because it is currently well supported (by UNESCO for
instance).  But many others are also worth.  Maybe IPFS or GNUnet are
worth.



>> It is a difficult topic to know what information the ’uri’ field
>> should contain for robust long-term; a topic with a lot of unknowns,
>> although many solutions are around, they are a strong change of
>> habits and changing my own habits is already hard, so a collective
>> change is a big collective challenge. :-)
>
> We're going back to Cantor's argument for raw commits.  I'm not opposed
> to using commits as value of the commit field (let-bound commits
> reflected in the version, that is), but let's not forget that this
> robustness argument still presupposes that the (commit tag) binding is
> the point of failure.  This probably holds to some degree for "npm-
> something", but we also have a fair amount of e.g. GNOME-related
> packages which we trust to have robust tags and the only reason we
> don't use mirror://gnome to refer to them is because it's not in GNOME
> mirrors (yet). 

Because this point of failure for tag potentially exists, the
counter-measure would be to add more (check integrity, fallback to other
servers, etc.) and even it could be impossible if the tag changed and
propagated to all.

I am not saying neither that we have to replace tomorrow all the tags by
commit hashes.  My point is just that this tag in the ’uri’ field does
not appears to me a correct design.  For sure, I agree it is convenient
but I think it is not The Right Thing.  Sadly, I do not know what The
Right Thing is – and commit hash is probably not The Right Thing but it
seems to me a direction to explore.


>> For instance, SWH promotes swhid instead of DOI for referencing the
>> publications.  I am not sure it is really popular outside a small
>> French subgroup. ;-)
>
> Completely off-topic, but isn't part of the point of DOIs that you can
> fetch the revised paper as well?  I can understand putting OpenData
> behind an SWH ID rather than a DOI, but the paper itself?  Why?

If you find it off-topic, fine.  My point is to say that DOI (extrinsic)
is not known to not be The Right Thing for referencing and intrinsic
identifier is really better but it seems hard to convince people to
switch.

For instance, DOI is known to be fragile because it relies on an
external centralized mutable index to have the bijection between the
identifier and the content.  If today I cite doi:123abc then tomorrow
when you reach this very same identifier doi:123abc, then you have no
guarantee that it is the same content.  Obviously, it is not an issue by
itself, but in scientific context where fraud is something, once the
centralized mutable index is corrupted, done!

Because SWH-ID only depends on the content itself, it allows
decentralization and integrity check.

Do not take me wrong, I am not comparing Git SHA-1 hash with an
integrity check. :-)  Well, maybe the interested reader can give a look
at:

<https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers/>


All in all, I was trying to point that this extrinsic vs intrinsic thing
is bigger than ’git-fetch’ and commit hash vs tag and the root appears
to me in exploring what the ’uri’ field should contain.  This DOI was an
example to show the topic is not easy.


>> Somehow, find some rationale –readability, matching versions, etc.–
>> and then find counter-measures of their flaws to keep extrinsic
>> values –tag, revision, etc.– is, for what my opinion is worth, not
>> the correct level or frame when thinking about robustness and long-
>> term.
>
> For what it's worth, I don't think content addressing everything
> (particularly relying on a single service to do so) is robust in the
> long term, it just introduces larger failure points.  The only robust
> way of increasing robustness is to add more fallbacks and redundancies
> (and actually use them).

We disagree; especially on “only robust way” and “add more”.  And from
my side, now I exposed all, I guess. ;-)


Cheers,
simon

Re: On raw strings in commit field

Reply via email to