Hi, On Fri, 31 Dec 2021 at 16:19, Liliana Marie Prikler <liliana.prik...@gmail.com> wrote:
> You're also missing the part in which it currently relies on a single > server to do all this, but there are plans to move it out to multiple > ones, i.e. adding fallbacks/redundancy to your fallback mechanism, > which for the record is a good idea to have. I do not see why you guess I am missing a part. Anyway. Redundancy adds one kind of robustness: resilience. Obviously it helps. For sure, I want that too because it is the straightforward, an “easy“ and “quick” way to have robustness. However this assumes all the redundant nodes of the web of nets will be still up, at least enough to have this… robustness. Me too, I hope Guix will be popular and all redundancies still running when I will be old or dead. But I will not bet on that assumption. What Timothy is doing with Preservation of Guix and a window of ~2years shows that any web of nets is really fragile. I do not see why the one we are building around Guix will be different. Instead of trying to have robustness by adding more and more, from my point of view, it appears to me the occasion to rethink and try to have robustness with less. I agree with you that various fallbacks is one good direction to go. SWH is one thing because it is currently well supported (by UNESCO for instance). But many others are also worth. Maybe IPFS or GNUnet are worth. >> It is a difficult topic to know what information the ’uri’ field >> should contain for robust long-term; a topic with a lot of unknowns, >> although many solutions are around, they are a strong change of >> habits and changing my own habits is already hard, so a collective >> change is a big collective challenge. :-) > > We're going back to Cantor's argument for raw commits. I'm not opposed > to using commits as value of the commit field (let-bound commits > reflected in the version, that is), but let's not forget that this > robustness argument still presupposes that the (commit tag) binding is > the point of failure. This probably holds to some degree for "npm- > something", but we also have a fair amount of e.g. GNOME-related > packages which we trust to have robust tags and the only reason we > don't use mirror://gnome to refer to them is because it's not in GNOME > mirrors (yet). Because this point of failure for tag potentially exists, the counter-measure would be to add more (check integrity, fallback to other servers, etc.) and even it could be impossible if the tag changed and propagated to all. I am not saying neither that we have to replace tomorrow all the tags by commit hashes. My point is just that this tag in the ’uri’ field does not appears to me a correct design. For sure, I agree it is convenient but I think it is not The Right Thing. Sadly, I do not know what The Right Thing is – and commit hash is probably not The Right Thing but it seems to me a direction to explore. >> For instance, SWH promotes swhid instead of DOI for referencing the >> publications. I am not sure it is really popular outside a small >> French subgroup. ;-) > > Completely off-topic, but isn't part of the point of DOIs that you can > fetch the revised paper as well? I can understand putting OpenData > behind an SWH ID rather than a DOI, but the paper itself? Why? If you find it off-topic, fine. My point is to say that DOI (extrinsic) is not known to not be The Right Thing for referencing and intrinsic identifier is really better but it seems hard to convince people to switch. For instance, DOI is known to be fragile because it relies on an external centralized mutable index to have the bijection between the identifier and the content. If today I cite doi:123abc then tomorrow when you reach this very same identifier doi:123abc, then you have no guarantee that it is the same content. Obviously, it is not an issue by itself, but in scientific context where fraud is something, once the centralized mutable index is corrupted, done! Because SWH-ID only depends on the content itself, it allows decentralization and integrity check. Do not take me wrong, I am not comparing Git SHA-1 hash with an integrity check. :-) Well, maybe the interested reader can give a look at: <https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers/> All in all, I was trying to point that this extrinsic vs intrinsic thing is bigger than ’git-fetch’ and commit hash vs tag and the root appears to me in exploring what the ’uri’ field should contain. This DOI was an example to show the topic is not easy. >> Somehow, find some rationale –readability, matching versions, etc.– >> and then find counter-measures of their flaws to keep extrinsic >> values –tag, revision, etc.– is, for what my opinion is worth, not >> the correct level or frame when thinking about robustness and long- >> term. > > For what it's worth, I don't think content addressing everything > (particularly relying on a single service to do so) is robust in the > long term, it just introduces larger failure points. The only robust > way of increasing robustness is to add more fallbacks and redundancies > (and actually use them). We disagree; especially on “only robust way” and “add more”. And from my side, now I exposed all, I guess. ;-) Cheers, simon