I remember some experiments in early development of WC-NG where we measured which checksums worked vs which ones were too expensive. Going to the SHA1 family was at least 5 times more expensive or so…
We determined back then SHA1 was good enough for our use and that of our users ‘except for those doing collision research’. Just adding more checksums internally, because we can won’t help our users… The only real solution is doing full comparisons when checksums match… Which virtually never happens. It happened for the first time now, so most likely never before for all of the Subversion users together. This is how we used MD5 before… But we determined SHA1 would be good enough to avoid this, even when such a collision would be found… as it is today. I don’t think this incident changes those original ideas about which hash is good enough… Perhaps some careful re-evaluation is necessary, but I don’t think we should just ‘fix this’ by bumping everything to the next hashtype. This ‘just use a more expensive hash’ may be a good approach for other users of hashes, but I don’t think we want to make every common Subversion operations much slower because there is one collision found using an insane amount of CPU/GPU power. Of course we should fix things to not break, but that is a different story. Bert Sent from Mail for Windows 10 From: Stefan Sperling Sent: vrijdag 24 februari 2017 17:10 To: Andreas Stieger Cc: Subversion Development Subject: Re: Files with identical SHA1 breaks the repo On Fri, Feb 24, 2017 at 04:17:44PM +0100, Andreas Stieger wrote: > Hi, > > "Stefan Hett" wrote: > > On 2/23/2017 9:02 PM, Øyvind A. Holm wrote: > > > This is the only known SHA-1 collision at the moment, but Google will > > > release the collision code in 90 days, so we can expect this not to last > > > forever. > > Reading up on that in an article on a German magazine [1] clarifies that > > the effort to create that hash still quite large (6500 CPU years + 100 > > GPU years to calculate the collision). So this relativates the impact a bit. > > Certainly I'm not trying to say that the situation on SVN's side > > should/could not be improved, though. > > > > [1] > > https://www.heise.de/newsticker/meldung/Todesstoss-Forscher-zerschmettern-SHA-1-3633589.html > > An occurrence of this issue in a production repository with the published > PDFs: > https://bugs.webkit.org/show_bug.cgi?id=168774#c29 > > Andreas Well, what did they expect? Did they expect that all software which is part of their toolchain has ever been tested with files that produce a SHA1 collision? Nobody had such files until yesterday... They should have tried this on a test repository first. Anyway, so SVN has multiple problems with SHA1 collisions. One problem is that the libsvn_wc code does the wrong thing when SHA1 hashes match but MD5 hashes do not. The error on checkout is happening because pristines are keyed on SHA1, and only one pristine is saved: $ ls .svn/pristine/ 38/ $ ls .svn/pristine/38/ 38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base $ sha1 .svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base SHA1 (.svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base) = 38762cf7f55934b34d179ae6a4c80cadccbb7f0a $ md5 .svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base MD5 (.svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base) = ee4aa52b139d925f8d8884402b0a750c By design, the current working copy format cannot store both of these PDFs. This is hard to solve without a working copy format bump :-/ The best fix would probably be moving libsvn_wc to SHA256 or SHA3. FSFS looks alright. The node records for these two PDFs look like this: [[[ id: 0-1.0.r1/5 type: file count: 0 text: 1 3 381130 422435 ee4aa52b139d925f8d8884402b0a750c 38762cf7f55934b34d179ae6a4c80cadccbb7f0a 0-0/_3 props: 1 4 56 44 cfa89e28d5298bc69638e814df40c883 cpath: /shattered-1.pdf copyroot: 0 / id: 2-1.0.r1/6 type: file count: 0 text: 1 3 381130 422435 5bd9d8cabc46041579a311230539b8d1 38762cf7f55934b34d179ae6a4c80cadccbb7f0a 0-0/_4 props: 1 4 56 44 cfa89e28d5298bc69638e814df40c883 cpath: /shattered-2.pdf copyroot: 0 / ]]] We should look into making the FSFS code make use of both checksums to handle ambiguities. It seems about time to add SHA256 and/or SHA3 as well. 'svnadmin load' fails, too: $ svnadmin create repo2 $ vi repo repo/ repo2/ $ vi repo2/db/fs fs-type fsfs.conf $ vi repo2/db/fsfs.conf # disable rep-sharing $ svnadmin dump repo > repo.dump * Dumped revision 0. * Dumped revision 1. $ svnadmin load repo2 < repo.dump <<< Started new transaction, based on original revision 1 * editing path : shattered-1.pdf ... done. * editing path : shattered-2.pdf ...subversion/libsvn_repos/load.c:709, subversion/libsvn_repos/load.c:351, subversion/libsvn_subr/stream.c:273, subversion/libsvn_subr/checksum.c:658: (apr_err=SVN_ERR_CHECKSUM_MISMATCH) svnadmin: E200014: Checksum mismatch for '/shattered-2.pdf': expected: 5bd9d8cabc46041579a311230539b8d1 actual: ee4aa52b139d925f8d8884402b0a750c Again, the dump file looks OK. This problem occurs somewhere in the commit processing path. No time to debug this ATM.