Hi,

I wish to discuss the issues surrounding the use of SHA1 in Fossil and
their consequences, as well as propose several possibilities to deal
with them.

I would like to take a moment to define collision resistance and
second-preimage resistance. A hash H is collision-resistant if it is
infeasible to come up with x1 and x2 such that H(x1)=H(x2). A hash H is
second-preimage resistant if given some x1, it is infeasible to come up
with x2 such that H(x1)=H(x2). Of course, collision resistance implies
second-preimage resistance (but not the other way around).

First I propose that the use of SHA1 in Fossil is a serious problem.
Even if no second-preimage attack is ever successful against it, a
collision attack is currently considered possible (although expensive) [1].

How much damage can be done given the capacity to generate collisions?
Suppose the attacker generates two versions of file "main.c" that share
the same SHA1 hash, one which is malicious ("main-malicious.c") and one
which is clean ("main-clean.c"). If the attacker can intercept
communications between the server and a developer, the attacker can push
"main-malicious.c" to the server, intercept the sync between the server
and developer and substitute "main-clean.c" for "main-malicious.c", then
wait until the developer tags and/or signs the change. Moreover, it will
appear as if the developer has PGP signed the malicious version!

If the attacker is in control of the server, then this becomes even
easier; push the clean version, tell the developer to tag/sign/approve
their checkin, then shun the clean version and replace it with the
malicious one.

If a project is hosted on multiple mirrors that periodically sync with
each other and the attacker knows that the main developer tends to use
only one of them, the attacker can push the clean version to the mirror
that is used by the main developer, and simultaneously push the
malicious one to the other servers.

These concerns are only amplified as the price of generating full SHA1
collisions drops (by further cryptanalytic advancement or by
technological improvements in computing).

Hoping that I have convinced you that this is a serious problem, I would
like to discuss the ways to tackle it.

The first solution is to do nothing and just tell users not to sync with
untrusted repositories. Given the distributed nature of software (and
otherwise) development, I believe it is a difficult burden to impose
upon developers that all contributors always be carefully vetted, and
that third-party (web) hosting never be trusted. I feel that this also
breaks the "eternally incorruptible" promise of Fossil.

The second solution is to incompatibly change the Fossil specification
and replace SHA1 hashes with BetterHash (for some value of BetterHash;
discussion below) in the definition of an artifact ID. This is a
*breaking* change, and requires the *modification* of artifacts (which I
believe is frowned upon in the fossil community to say the least). This
would break older hyperlinks (which would be easy to fix automatically
just by replacement when porting the artifacts to the new format), and
most definitely breaks older PGP clearsigned checkins (which would have
remained secure as long as SHA1 second-preimage attacks are infeasible).
The main advantage to this approach is that it is the most elegant and
easy to understand and deal with. The main disadvantage is that porting
artifacts to the new format requires their modification (which breaks
the "artifacts never change" promise; I would like to note that that
promise would also be broken as soon as an attacker inserts an artifact
for which a SHA1 collision is known).

The third solution is to change the Fossil specification to redefine the
artifact ID to be the concatenation of the SHA1 and BetterHash hash
digests, and allow 40 hexadecimal digit IDs as prefixes. One can show
that the preimage- and collision-resistance of this combination is at
least as good as the strongest of the two. The main advantage of this
approach is that it is not a breaking change, and does not require the
modification of older artifacts (hyperlinks stay the same too). The main
disadvantage is that if SHA1 preimages become feasible, an attacker can
definitely go back and mess with the pre-change SHA1-only artifacts (and
thus corrupt repositories, or worse). Another disadvantage is that the
SHA1 part of the ID takes up extra room and extra computing time with no
benefit in security.

As for the exact value of BetterHash, I would like to nominate
BLAKE2b-512 [2]. It is faster than both MD5 and SHA1, it is based upon
BLAKE which has received a lot of cryptanalytic attention during the
SHA3 competition, and it retains a large security margin (the best
(academic) attack to date is on a reduced version that does only 2.5
rounds instead of 10, and even then only downgrades the security from
512 to 481 bits).

Please let me know your thoughts on this matter.

Best regards,
Eduard

[1] https://sites.google.com/site/itstheshappening/
[2] https://blake2.net/
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to