Hi, I wish to discuss the issues surrounding the use of SHA1 in Fossil and their consequences, as well as propose several possibilities to deal with them.
I would like to take a moment to define collision resistance and second-preimage resistance. A hash H is collision-resistant if it is infeasible to come up with x1 and x2 such that H(x1)=H(x2). A hash H is second-preimage resistant if given some x1, it is infeasible to come up with x2 such that H(x1)=H(x2). Of course, collision resistance implies second-preimage resistance (but not the other way around). First I propose that the use of SHA1 in Fossil is a serious problem. Even if no second-preimage attack is ever successful against it, a collision attack is currently considered possible (although expensive) [1]. How much damage can be done given the capacity to generate collisions? Suppose the attacker generates two versions of file "main.c" that share the same SHA1 hash, one which is malicious ("main-malicious.c") and one which is clean ("main-clean.c"). If the attacker can intercept communications between the server and a developer, the attacker can push "main-malicious.c" to the server, intercept the sync between the server and developer and substitute "main-clean.c" for "main-malicious.c", then wait until the developer tags and/or signs the change. Moreover, it will appear as if the developer has PGP signed the malicious version! If the attacker is in control of the server, then this becomes even easier; push the clean version, tell the developer to tag/sign/approve their checkin, then shun the clean version and replace it with the malicious one. If a project is hosted on multiple mirrors that periodically sync with each other and the attacker knows that the main developer tends to use only one of them, the attacker can push the clean version to the mirror that is used by the main developer, and simultaneously push the malicious one to the other servers. These concerns are only amplified as the price of generating full SHA1 collisions drops (by further cryptanalytic advancement or by technological improvements in computing). Hoping that I have convinced you that this is a serious problem, I would like to discuss the ways to tackle it. The first solution is to do nothing and just tell users not to sync with untrusted repositories. Given the distributed nature of software (and otherwise) development, I believe it is a difficult burden to impose upon developers that all contributors always be carefully vetted, and that third-party (web) hosting never be trusted. I feel that this also breaks the "eternally incorruptible" promise of Fossil. The second solution is to incompatibly change the Fossil specification and replace SHA1 hashes with BetterHash (for some value of BetterHash; discussion below) in the definition of an artifact ID. This is a *breaking* change, and requires the *modification* of artifacts (which I believe is frowned upon in the fossil community to say the least). This would break older hyperlinks (which would be easy to fix automatically just by replacement when porting the artifacts to the new format), and most definitely breaks older PGP clearsigned checkins (which would have remained secure as long as SHA1 second-preimage attacks are infeasible). The main advantage to this approach is that it is the most elegant and easy to understand and deal with. The main disadvantage is that porting artifacts to the new format requires their modification (which breaks the "artifacts never change" promise; I would like to note that that promise would also be broken as soon as an attacker inserts an artifact for which a SHA1 collision is known). The third solution is to change the Fossil specification to redefine the artifact ID to be the concatenation of the SHA1 and BetterHash hash digests, and allow 40 hexadecimal digit IDs as prefixes. One can show that the preimage- and collision-resistance of this combination is at least as good as the strongest of the two. The main advantage of this approach is that it is not a breaking change, and does not require the modification of older artifacts (hyperlinks stay the same too). The main disadvantage is that if SHA1 preimages become feasible, an attacker can definitely go back and mess with the pre-change SHA1-only artifacts (and thus corrupt repositories, or worse). Another disadvantage is that the SHA1 part of the ID takes up extra room and extra computing time with no benefit in security. As for the exact value of BetterHash, I would like to nominate BLAKE2b-512 [2]. It is faster than both MD5 and SHA1, it is based upon BLAKE which has received a lot of cryptanalytic attention during the SHA3 competition, and it retains a large security margin (the best (academic) attack to date is on a reduced version that does only 2.5 rounds instead of 10, and even then only downgrades the security from 512 to 481 bits). Please let me know your thoughts on this matter. Best regards, Eduard [1] https://sites.google.com/site/itstheshappening/ [2] https://blake2.net/ _______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users