On Fri, Jan 20, 2023 at 9:51 AM Nathan Hartman <hartman.nat...@gmail.com> wrote: > > On Fri, Jan 20, 2023 at 7:18 AM Daniel Shahaf <d...@daniel.shahaf.name> wrote: > > > > Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00: > > > I can complete the work on this branch and bring it to a production-ready > > > state, assuming there are no objections. > > > > Your assumption is counterfactual: > > > > https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E > > > > https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E > > > > Objections have been raised, been left unanswered, and now > > implementation work has commenced following the original design. That's > > not acceptable. I'm vetoing the change until a non-rubber-stamp design > > discussion has been completed on the public dev@ list. > > > I think we can start by discussing some of the pros and cons. > > There are two separate things here but they end up being mixed > together in the discussions: > > 1. Pros/cons of switching from SHA1 to another hash. > 2. Supporting different hash types in f32. > > Regarding the first item: > > Do we need to switch from SHA1 to another hash? One con that was > already mentioned [1] is that we'll never really be able to switch > away from SHA1, as there are existing clients, servers, and working > copies out there. Not only will we have to support SHA1 forever for > backwards compatibility, but any new hash that is ever added will need > to be supported forever as well. If we accumulate many of those, it > might become a burden, but perhaps there will be only one new hash and > it will be the "blessed" one for the next 20 years. > > There were concerns about collisions; since the space of possible > input datasets is infinite and the hash code size is fixed and finite > (pretty large, but very much finite), there will always be collisions > with any hash. The significant questions are: how small is the > probability of a collision, and (for the purposes of security) how > hard is it to generate input data that produces a collision? The > answer to the first question is fixed; the second one is probably > expected to change over time, as algorithms are studied and new > vulnerabilities are found. Which hash type do you pick, and who knows > if a hash thought to be very strong (today) later proves easier to > crack than one that is thought not as strong? We can only guess. > > Taking a step back, this discussion started because pristine-free WCs > are IIUC more dependent on comparing hashes than pristineful WCs, and > therefore a hash collision could have more impact in a pristine-free > WC. "Guarantees" were mentioned, but I think it's important to state > that there's only a guarantee of probability, since as mentioned above > all hashes will have collisions. > > We already can't store files with identical SHA1 hashes, but AFAIK the > only meaningful impact we've ever heard is that security researchers > cannot track files they generate with deliberate collisions. The same > would be true with any hash type, for collisions within that hash > type. > > Advantages of switching to a new hash type might include: reducing the > already small probability of collisions; choosing an algorithm that is > faster or that has (or is expected to have in the future) hardware > acceleration on commodity systems, perhaps addressing user perception > (if SHA1 is seen as old and uncool), but then again, we can't really > get rid of SHA1... > > [1] https://lists.apache.org/thread/v3dv1dtod2t9yrf920h4838g2t0l94cw > > Regarding the second item: > > Since the premise of this feature is to support adding new hash types > without bumping wc formats, it follows that any new hash type will > create compatibility problems for clients that support f32 but not the > specific new hash type. In light of that, it might just be better to > bump the wc format and then you know at the outset that you need to > upgrade your client. Just thinking out loud here but this might be > (partly) mitigated by trying to guess which hash types we might want > in the future and supporting them now, even if no existing client will > actually use them, but I don't really like this idea. > > I'll have to return later with more thoughts...
Just quickly I want to say that although I mentioned mostly cons above, I don't want to appear to be against switching hashes nor against supporting multiple hash types in f32; rather, since the i525-pod feature necessitated a format bump anyway, I do think it makes sense to consider adding such changes now, to avoid a future format bump, and I'm considering arguments contrary to that from a desire to be unbiased about it. I have more thoughts (including more pros) but have some things to attend to now. Looking forward to hearing others' thoughts as well. Cheers, Nathan