Why don't y'all take the same tactic as Git does - SHA1 the contents of the file *and a prepended a type/length field* ?.
That and a tool to back convert SHA1s for existing repos. Linus weighed in again: https://plus.google.com/+LinusTorvalds/posts/7tp2gYWQugL Svn is more likely to be used as a store for larger binaries, that need to be non-repudiable than Git is, even if that is still an edge case. -ph - Paul On Sun, Feb 26, 2017 at 11:08 AM, Garance A Drosehn <dro...@rpi.edu> wrote: > On 24 Feb 2017, at 15:46, Stefan Sperling wrote: > > > > I believe we should prepare a new working format for 1.10.0 > > which addresses this problem. I don't see a good way of fixing > > it without a format bump. The bright side of this is that it > > gives us a good reason to get 1.10.0 ready ASAP. > > > > We can switch to a better hash algorithm with a WC format > > bump. > > One of the previous messages mentioned that better hash > algorithms are more expensive. So let me mention a tactic > that I used many years ago, when MD5 was the best digest > algorithm that I knew of, and I didn't trust it for the > larger files I was working with at the time: > > Instead of going with a completely different hash algorithm, > just double-down on the one you're using. What I did was to > calculate one digest the standard way, and then a second one > which summed up every-other-byte (or every 3rd byte, or ...). > So to get a collision, not only do two files have to get the > same digest-result for all their data, but they have to also > get the same digest-result when exactly half the data is > skipped over. > > (I did this a long time ago, and forget the details. What > I may have done for performance reasons was every-other-word, > not every-other-byte) > > My thinking was that *any* single algorithm which processes > all the data is going to get collisions, eventually. But it > will be much harder for someone to generate a duplicate file > where there will also be a collision when summing up only > half of the data. > > I'm not claiming this is great cure-all solution, but just > that it's an alternate tactic which might be interesting. > People could create repositories with just the one digest, > or upgrade it to use multiple digests if they have the need. > > I found a few benchmarks which suggest that sha-256 is maybe > twice as expensive as sha-1, so calculating two sha-1 digests > might be a reasonable alternative. > > -- > Garance Alistair Drosehn = dro...@rpi.edu > Senior Systems Programmer or g...@freebsd.org > Rensselaer Polytechnic Institute; Troy, NY; USA >