On 24 Feb 2017, at 15:46, Stefan Sperling wrote: > > I believe we should prepare a new working format for 1.10.0 > which addresses this problem. I don't see a good way of fixing > it without a format bump. The bright side of this is that it > gives us a good reason to get 1.10.0 ready ASAP. > > We can switch to a better hash algorithm with a WC format > bump.
One of the previous messages mentioned that better hash algorithms are more expensive. So let me mention a tactic that I used many years ago, when MD5 was the best digest algorithm that I knew of, and I didn't trust it for the larger files I was working with at the time: Instead of going with a completely different hash algorithm, just double-down on the one you're using. What I did was to calculate one digest the standard way, and then a second one which summed up every-other-byte (or every 3rd byte, or ...). So to get a collision, not only do two files have to get the same digest-result for all their data, but they have to also get the same digest-result when exactly half the data is skipped over. (I did this a long time ago, and forget the details. What I may have done for performance reasons was every-other-word, not every-other-byte) My thinking was that *any* single algorithm which processes all the data is going to get collisions, eventually. But it will be much harder for someone to generate a duplicate file where there will also be a collision when summing up only half of the data. I'm not claiming this is great cure-all solution, but just that it's an alternate tactic which might be interesting. People could create repositories with just the one digest, or upgrade it to use multiple digests if they have the need. I found a few benchmarks which suggest that sha-256 is maybe twice as expensive as sha-1, so calculating two sha-1 digests might be a reasonable alternative. -- Garance Alistair Drosehn = dro...@rpi.edu Senior Systems Programmer or g...@freebsd.org Rensselaer Polytechnic Institute; Troy, NY; USA