Re: [fossil-users] SHA1 and security

Eduard Thu, 29 Oct 2015 07:41:38 -0700

Hi Scott,

Thank you for your reply!

On 10/29/2015 01:40 AM, Scott Robison wrote:
> On Wed, Oct 28, 2015 at 6:37 PM, Eduard <[email protected]
> <mailto:[email protected]>> wrote:
> 
> If fossil didn't say it used SHA1 to generate artifact IDs, I don't
> think anyone would care how it generated IDs.
> 
> An artifact ID is a way of assigning a fixed length identifier to an
> artifact with good distribution of IDs in the fixed length space
> provided. It is not intended to be a cryptographic.
> (...)
> In this case, it's just an identifier, and the odds of a
> non-malicious collision are so close to zero that those odds might as
> well be zero.

That's the thing, an artifact ID is *not* just an identifier! I don't
know whether this is an intentional feature or not, but a Fossil
repository is structured as a Merkle tree (or a "hash tree"). If I know
that a single commit is genuine (because I wrote it down on a piece of
paper or because it is PGP signed), then that guarantees that all of its
ancestors (and all of their files) have not been tampered with. This is
a very powerful property, assuming that the hash function is
cryptographically secure.

In contrast, suppose that instead artifacts were identified by their
CRC160 (like CRC32, but longer) or by some sort of GUID. Accidental
collisions would be extremely unlikely, but intentional collisions or
preimages would be trivial to forge. Knowing that a particular commit
manifest is genuine would be pointless; anyone could make up a different
ancestor tree that matches the parent ID in the manifest. Worse yet,
that would not even guarantee the 'goodness' of the files in the commit
(anyone would be able to make up a different file with the same artifact
ID).

> You can't create a collision in advance based on not knowing who is
> going to commit what to the repository in advance.
> 
> Let's say you do, after the fact, manage to create a collision. If you
> try to upload it to the repository it will be ignored because fossil
> believes (correctly) it already has the artifact in question.

That's not a collision, that's a second-preimage. My point is rather
that an attacker can serve the clean version of the file to the
auditors, and the malicious version to the users (who trust the auditors
to tell them whether a commit is okay to use). This doesn't require a
second-preimage attack against the hash, only the ability to generate
collisions (which is possible right now, but expensive).

> As you observe, one could in theory mount a MITM attack. At this point
> what is to stop them from serving a completely alien repository that
> they've specially crafted? No collisions required.

It would then be obvious that the repository is completely alien since
none of the artifact IDs match. If the user knows at least one top-level
genuine artifact ID (for example, because they trust a developer and
that developer signs their commits), then that guarantees the integrity
of all the ancestors.

> In fact, the "easiest" way to getting people to use malicious software
> is to host a compromised repository and convince people to use it
> instead of the "blessed" repository.

I agree, that's the way it usually works in practice. But no amount of
checking (save for re-hashing every single artifact in a different
database and signing *that*) can protect against SHA1 collision (or
second-preimage) attacks.

> If you want to change the way fossil does things to limit the
> possibility of fraudulent artifacts, that's fine. Perhaps prefixing the
> blob data with a length (ala git) might help mitigate the possibility of
> hash collisions. Perhaps creating a hash of the complete commit (vs just
> the manifest) and storing it in the manifest might help.

If the hash function itself is broken, neither of those would work. The
first fix would only work if SHA1 collisions are only possible with
different-length inputs (from what I understand, they're usually on
equal-length inputs), and the second fix is essentially the R-card
(which has been deprecated for efficiency reasons, and because it is
redundant with the files' artifact IDs which (are supposed to) guarantee
their contents' genuineness).

> Ultimately, one can chase hash algorithms forever trying to create some
> ultimately secure ideal. In the case of actual security software, I can
> see the point.

A DVCS *is* a security software. It is perhaps the most important
security software, since the security and integrity of all other
software ultimately depends on it. The integrity of the sqlite
repository (which is probably used by over a billion people right now)
depends on it.

> one can chase hash algorithms forever

I know that it may appear so (I thought so too for a long time), but
it's not a question of choosing better hash algorithms forever either.
>From the very beginning SHA1 had a very short digest (n=160 bits), so
the most security it could ever have against a collision attack is
n/2=80 bits (i.e. that's assuming that it is a perfect hash function,
which we now know it isn't). It stands to reason that any significant
reduction in collision security from that point would bring an attack
into the realm of feasibility (just 30 bits would be enough to utterly
break it).

Suppose that instead we chose a 512-bit hash. Its maximum collision
security is 256 bits. Suppose that some cryptographic attack eventually
manages to downgrade that to 196 bits (i.e. a 60 bit reduction in
security, which is huge and extremely unlikely). That means it would
take 2^196 operations to generate a collision. Could technological
advancement maybe make that possible someday? I'll just quote Bruce
Schneier's calculation:

-- begin quote --

    One of the consequences of the second law of thermodynamics is that
a certain amount of energy is necessary to represent information. To
record a single bit by changing the state of a system requires an amount
of energy no less than kT, where T is the absolute temperature of the
system and k is the Boltzman constant. (Stick with me; the physics
lesson is almost over.)

    Given that k = 1.38×10^-16 erg/°Kelvin, and that the ambient
temperature of the universe is 3.2°Kelvin, an ideal computer running at
3.2°K would consume 4.4×10^-16 ergs every time it set or cleared a bit.
To run a computer any colder than the cosmic background radiation would
require extra energy to run a heat pump.

    Now, the annual energy output of our sun is about 1.21×10^41 ergs.
This is enough to power about 2.7×10^56 single bit changes on our ideal
computer; enough state changes to put a 187-bit counter through all its
values. If we built a Dyson sphere around the sun and captured all its
energy for 32 years, without any loss, we could power a computer to
count up to 2^192. Of course, it wouldn't have the energy left over to
perform any useful calculations with this counter.

    But that's just one star, and a measly one at that. A typical
supernova releases something like 10^51 ergs. (About a hundred times as
much energy would be released in the form of neutrinos, but let them go
for now.) If all of this energy could be channeled into a single orgy of
computation, a 219-bit counter could be cycled through all of its states.

    These numbers have nothing to do with the technology of the devices;
they are the maximums that thermodynamics will allow. And they strongly
imply that brute-force attacks against 256-bit keys will be infeasible
until computers are built from something other than matter and occupy
something other than space.

-- end quote --

Best,
Eduard
_______________________________________________
fossil-users mailing list
[email protected]
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] SHA1 and security

Reply via email to