Hi Warren,

Thanks for replying!

On 10/29/2015 02:46 PM, Warren Young wrote:
> On Oct 28, 2015, at 6:37 PM, Eduard <[email protected]> wrote:
>>
>> I wish to discuss the issues surrounding the use of SHA1 in Fossil
> 
> Have you read the prior discussions on this?
> 
>   
> http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg18053.html
>   
> http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg05970.html
>   
> http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg21423.html

I had read 2/3 of them, yes. Thanks for the third one!

> 
>> First I propose that the use of SHA1 in Fossil is a serious problem.
> 
> The known attacks on SHA-1 are still computationally expensive, and will 
> remain so for years.  Not impossible, but still very difficult.  We have time 
> to move, if we need to.

I agree. I also believe that the best time to think about it is right
now. The number of Fossil users will only increase with time (in fact
I'm about to introduce four new people to Fossil), and so will the
number of people potentially annoyed by a non-backwards compatible
change in the specification.

> But also, and much more importantly, most of the attacks on SHA-1 only apply 
> to standalone blob cases such as binary package validation, X.509 certificate 
> signing, etc.  In Fossil, most of the SHA-1 checksummed artifacts are chained 
> in some way, so that you can only modify the leaves of branches.

And individual files (that are part of commits). That won't show up in
the timeline.

>> If the attacker can intercept
>> communications between the server and a developer
> 
> …then you did not run Fossil over TLS, like you should if MITM is a 
> legitimate risk in your situation. :)
> 
>> If the attacker is in control of the server
> 
> …then he can serve you any content he likes, no matter how good your hash 
> algorithm is.

True, but he shouldn't be able to convince me that ID "abcdef"
corresponds to something other than the original artifact created with
ID "abcdef". Again, I might know (through some other source, e.g.
PGP-signed email) that artifact "abcdef" is genuine, and it shouldn't
matter where I download it from. If artifact "abcdef" refers to "xyzzy",
trusting the genuineness of "abcdef" should imply trusting that of "xyzzy".

I also don't believe that the users and developers should have to trust
the Fossil server (including mirrors) and its operator; I don't have to
trust my Debian mirror to download packages (and their sources) from it.
That would avoid happenings like the XcodeGhost incident.

> The correct solution here is something like TLS with certificate pinning, GPG 
> signing, etc.
That's the thing, GPG signing covers the contents of the manifest, which
itself refers to the files inside it only by their SHA1 hash. If someone
substitutes a file with a malicious file that hashes the same, it won't
change anything in the manifest and the GPG signature will remain valid.

>> The third solution is to change the Fossil specification to redefine the
>> artifact ID to be the concatenation of the SHA1 and BetterHash
> 
> A fourth solution is to use Modular Crypt Format to declare the hash for each 
> artifact, and for future Fossil versions to tolerate SHA-1 only in existing 
> artifacts, accepting new ones using only known-good algorithms:
> 
>   https://pythonhosted.org/passlib/modular_crypt_format.html
> 
> This could be done without breaking the DB, simply because a 20-byte hash 
> must be SHA-1, since even a 160-bit BetterHash will have the MCF wrapper on 
> it, making it more than 20 bytes.  
> 
> The SQLite card format parser would have to be made more flexible, to make it 
> understand that if it sees a leading dollar sign, the following hash can be 
> variable-width.

That is a great (and extensible!) solution! There are a few issues though:
- Every artifact must be hashed by every known algorithm. The database
size grows linearly with the number of hashing algorithms.
- There must be an additional mechanism for upgrading the older hash
version artifacts. Consider a checkin manifest from 3 years ago. It is
very likely that no new checkin/branch will ever refer to it directly,
so nobody will ever refer to it by new-hash. Worse yet, it is likely
nobody will ever refer to the files inside that checkin by new-hash. If
a preimage attack on old-hash becomes possible (or even easy), one could
mess with the artifacts that are only referred to using old-hash.

I don't believe the first issue will ever be a problem, though, since I
personally don't think we'll ever need to go past BetterHash-512.

As for the second issue, one solution is to rehash all of the older
artifacts using new-hash and rewrite all of the control artifacts in
terms of new-hash (this operation is fully deterministic and can be
verified independently). This won't play well at all with shunned
content (since we can't recompute hashes on artifacts we don't have
anymore), and will definitely do very badly if one tries to put back
shunned content (since we've probably put in some sort of placeholder
null value in the manifest). I don't know whether the adding-back
shunned content part is really an issue; we only shun things when we
want them truly gone forever. But there is still the annoying issue that
if two people don't have the same shunning lists, they will end up with
radically different new-hash artifact sets (one checkin will have a
placeholder whereas the other one doesn't, and that will change the
artifact IDs of all of the descendants). So I guess exactly one person
should upgrade the hashes once per project (which I don't believe to be
a really terrible limitation, especially since their work can be
verified independently). This also has the annoying side-effect of
increasing the space taken up by control artifacts (since we're carrying
both the old-hash and the new-hash versions), but I guess one could
purge all of the old-hash control artifacts from the repository after a
few years (once old-hash is no longer trusted at all).

PGP-clearsigned manifests would probably also need to be re-signed in a
timely manner (in all likelihood the hash function PGP used when signing
the manifest has also been deprecated). (This could be done
after-the-fact using a signed tag.)

There is also the issue that (e.g. URL) references to old-hash artifacts
will be broken. I'm not sure how I feel about that; one could say that
they *should* be broken because we can no longer be certain about what
they point to (assuming that we no longer trust old-hash's security).
Intraproject references can always be fixed (in an automated manner),
but interproject references will likely be much harder to upgrade. It
might be highly useful to write a tool which scans a text file for
artifact-referencing URLs and tries to resolve the hash-upgraded version
automatically assuming that the referenced repository is available locally.

I'm not sure whether in this approach the new-hash control artifacts
should explicitly list the old-hash artifacts as parents (or maybe as
some new card type). It may be useful as a quick way to identify the
old-hash corresponding control artifact (it may even resolve some
ambiguity when verifying the transition from old-hash to new-hash).
Thoughts?

(Also are there any issues on any of the supported platforms with having
dollar signs in filenames (or URLs)? Just a random thought.)

Best,
Eduard

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
fossil-users mailing list
[email protected]
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to