Folks:

dstufft already correctly explained that relying on MD5 allows for
"doppelganger" packages -- two (or more) packages which are engineered at birth
to have the same hash as each other. It isn't clear to me that this can be used
for evil, but it isn't obvious that it *can't* be used for evil, either. So it
would certainly be helpful to upgrade the hash function so that we don't have
to think about that anymore, but in my opinion it is not an emergency.

I'd like to push back on the other risk, that someone might figure out how to
make MD5 second-pre-images. I don't think this is a risk that we need to
urgently address, and I've written a short note explaining why. This note is
incomplete, badly edited, has not been peer-reviewed, and is not ready for
publication, but I thought it might help folks evaluate how urgent it is to
upgrade from MD5, so here it is.

Regards,

Zooko
???
========================================================================================
the historical success of collision attacks does not imply a danger of 
pre-image attacks
========================================================================================

by Zooko Wilcox-O'Hearn, LeastAuthority.com, 2013-07-29

Summary
=======

Most of the secure hash functions designed before 2007 have turned out to be
vulnerable to collision attacks. This includes the widely-used secure hash
functions MD5 and SHA-1.

Newer hash functions, including those designed since the SHA-3 project began
in 2007, do not appear to be vulnerable to collision attacks, but since they
are newer, there has also been less time for cryptanalysts to find flaws in
them.

A widely cited web page shows a graphical representation of the history of
various hash functions being broken:

http://valerieaurora.org/hash.html

The advice on that web page is that if you are relying on your hash function
for collision-resistance, then you should be prepared to migrate to a new
hash function every few years.

.. insert table based on the main table below, but showing only vulnerability 
to collisions instead of pre-images

What about pre-image attacks or second pre-image attacks?

Some systems require their secure hash function to resist pre-image and
2nd-pre-image attacks, but do not require their secure hash function to
resist collision attacks. For such systems, an interesting question is
whether many or few of the secure hash functions published in the last 23
years???since the advent of modern cryptography???have turned out to be
vulnerable to pre-image or 2nd-pre-image attacks.

The answer is that with a single exception, no secure hash function has ever
been shown to be vulnerable to (2nd-)pre-image attacks. That single exception
is the second-oldest hash function ever designed, Snefru, which was designed
in 1989 and 1990, and which turned out to be vulnerable to differential
cryptanalysis. Differential cryptanalysis was discovered (by the open
research community) in 1990.

Preliminaries
=============

The input to a secure hash function is called the *pre-image* and the output
is called the *image*.

A hash function *collision* is two different inputs (pre-images) which result
in the same output. A hash function is *collision-resistant* if an adversary
can't find any collision.

A hash function is *pre-image resistant* if, given an output (image), an
adversary can't find any input (pre-image) which results in that output.

A hash function is *second pre-image resistant* if, given a pre-image, an
adversary can't find any *other* pre-image which results in the same image.

Motivation
==========

My motivation for caring about pre-image resistance and 2nd-pre-image
resistance is that it is possible to build digital signatures from secure
hash functions. Some of these *hash-based digital signatures* have been
proven to be secure (resistant to forgery) as long as the hash function they
are built out of has second-pre-image resistance, e.g. Buchmann-2011_.

Such a hash-based digital signature would fail if its underlying hash
function failed at second-pre-image resistance, but this is the *only* way
that it could be broken???any attack which was able to forge digital signatures
against such a scheme would *have* to violate the second-pre-image resistance
of the underlying hash function.

One reason that hash-based digital signatures might be useful is that if an
attacker has a sufficiently large quantum computer, they could forge digital
signatures that rely on factorization or discrete log, such as RSA, DSA,
ECDSA, or Ed25519. There is not any reason to think that such a quantum
computer would enable them to break secure hash functions, however.

Survey of attacks on hash functions
===================================

We know that practical, widely-deployed secure hash functions have turned out
to be vulnerable to collision attacks. MD5 is the most widely used secure
hash function which turns out to admit collisions, but many other secure hash
functions have likewise been found vulnerable to collisions.

For some uses, such as the hash-based digital signatures mentioned above, it
would be harmless to be able to generate collisions, but harmful to be able
to generate pre-images or 2nd-pre-images [*].  For such systems, the relevant
question is not whether hash function designs have historically been revealed
to be vulnerable to collisions but instead whether they've been revealed to
be vulnerable to (2nd-)pre-images.

Here are the results of my search for pre-image or 2nd-pre-image attacks on
widely-studied hash functions.

*The bottom line is that no widely-studied hash function has ever succumbed
to a (2nd-)pre-image attack except for Snefru.*

Snefru was designed by Merkle in 1989 and proved vulnerable to differential
cryptanalysis by Biham and Shamir in 1992. (Differential cryptanalysis had
just been discovered in the open world by Biham and Shamir in 1990.) No other
widely-studied hash function has been shown to be vulnerable to a practical
(2nd-)pre-image attack. Furthermore, no other widely-studied hash function
has been shown to be vulnerable to a (2nd-)pre-image attack that is more
efficient than brute force, even if we were to count attacks too expensive
for anyone to actually implement!

The history of (2nd-)pre-image attacks is therefore quite different from the
history of collision attacks. Most hash functions have been proven vulnerable
to collision attacks more efficient than brute force, and even to collision
attacks that could be implemented in practice.

Here I omit papers with attacks which are less efficient than other published
attacks (of course). I omit attacks on reduced-round variants of hash
functions (there are a lot of those). I omit attacks which have unrealistic
requirements, like attacks which require which require 2??????? precomputation 
or
which require the messages to be 2?????? blocks long.

"---" in a row means that I haven't found any published attack that fit these
criteria.

??? *bit*: the number of bits of output
??? *cpb*: cycles per byte on ebash's amd64-sandy0_ for 4096 bytes, worst 
quartile
??? *comp*: approximate computation required for the attack
??? *mem*: approximate memory required for the attack
??? *??? brute*: does the attack comp * the attack mem cost less than brute 
force search (see Bernstein-2005_)
??? *possible?*: could the attack actually be implemented

+-------------+------+-----+-----+--------------------------------------------------------+--------------------------------------------------------+
| hash        | year | bit | cpb | (2nd-)preimage attacks                       
          | collision attacks                                      |
|             |      |     |     
+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
|             |      |     |     | comp | mem | ??? brute | possible?  | 
reference          | comp | mem | ??? brute | possible?  | reference          |
+=============+======+=====+=====+======+=====+=========+============+====================+======+=====+=========+============+====================+
| MD2         | 1989 | 128 | 638 | 2?????  | 2????? | no      | ---        | 
Knudsen-2007_      | 2?????  | 2????? | no      | ---        | Knudsen-2007_    
  |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| Snefru_ -2  | 1990 | 128 |   ? | 2?????  | 2???  | yes     | yes        | 
Biham-2008_        | 2????  | 2???  | yes     | yes        | Biham-2008_        
|
+-------------+      |     
+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+
                    |
|         -3  |      |     |   ? | 2??????  | 2???  | yes     | yes        |    
                | 2?????  | 2???  | yes     | yes        |                    |
+-------------+      |     +-----+------+-----+---------+------------+          
          +------+-----+---------+------------+                    |
|         -4  |      |     |   ? | ???2?????? | 2???  | yes     | no         |  
                  | ???2?????? | 2???  | yes     | yes        |                 
   |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| MD4         | 1990 | 128 |   4 | 2??????  | 2????? | no      | ---        | 
Zhong-2010_        | 2??   | 2???  | yes     | yes        | Naito-2006_        |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| RIPEMD      | 1990 | 128 |   ? | ---  | --- | ---     | ---        | ---      
          | 2?????  | 2???  | yes     | yes        | Wang-2005a_        |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| MD5         | 1991 | 128 |   6 | 2?????? | 2?????? | no      | ---        | 
Sasaki-2009_       | 2?????  | 2???  | yes     | yes        | Stevens-2007_     
 |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| HAVAL-256-3 | 1992 | 256 |   ? | 2??????? | 2?????? | no      | ---        | 
Sasaki-2008_       | 2?????  | 2???  | yes     | yes        | 
`Van_Rompay-2003`_ |
+-------------+      +     +-----+------+-----+---------+------------+          
          +------+-----+---------+------------+--------------------+
|          -4 |      |     |   ? | 2???????? | 2?????? | no      | ---        | 
                   | 2?????  | 2???  | yes     | yes        | Yu-2006_          
 |
+-------------+      +     +-----+------+-----+---------+------------+          
          +------+-----+---------+------------+                    +
|          -5 |      |     |   ? | 2???????? | 2?????? | no      | ---        | 
                   | 2?????? | 2???  | yes     | no         |                   
 |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| SHA-0       | 1993 | 160 |   ? | 2???????? | 2???  | no      | ---        | 
---                | 2?????  | 2???  | yes     | yes        | Manuel-2008_      
 |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| GOST        | 1994 | 256 |   ? | 2??????? | 2?????? | no      | ---        | 
Mendel-2008_       | 2???????? | 2???  | yes     | no         | Mendel-2008_    
   |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| SHA-1       | 1995 | 160 |   8 | ---  | --- | ---     | ---        |          
          | 2??????  | 2???  | yes     | yes        | Wang-2005b_        |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| RIPEMD-160  | 1996 | 160 |  16 | ---  | --- | ---     | ---        | ---      
          | ---  | --- | ---     | ---        | ---                |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| Tiger       | 1996 | 192 |   7 | 2???????? | 2???  | no      | ---        | 
Guo-2010_          | ---  | --- | ---     | ---        | ---                |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| Whirlpool   | 2000 | 512 |  25 | ---  | --- | ---     | ---        | ---      
          | ---  | --- | ---     | ---        | ---                |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| Panama      | 2000 | 512 |   ? | ---  | --- | ---     | ---        | ---      
          | 2???   | 2???  | yes     | yes        | Daemen-2007_       |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| SHA-256     | 2002 | 256 |  18 | ---  | --- | ---     | ---        | ---      
          | ---  | --- | ---     | ---        | ---                |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+
| RadioGat??n  | 2006 | 256 |   ? | ---  | --- | ---     | ---        | ---     
           | ---  | --- | ---     | ---        | ---                |
+-------------+------+-----+-----+------+-----+---------+------------+--------------------+------+-----+---------+------------+--------------------+

.. _Daemen-2007: http://radiogatun.noekeon.org/panama/PanamaAttack.pdf

Then we get to the SHA-3 competition, which began in 2007. None of the 14
second-round candidates for SHA-3 have been shown to be vulnerable to any
attack better than brute force, either to find collisions or to find
(2nd-)pre-images [SHA-3-Zoo_ ].

Here are rows for the five SHA-3 finalists. The cpb is potentially relevant
because a hash function which used *too* few cycles per byte would fail at
its goals, including its goal of (2nd-)pre-image-resistance. Of course we
don't know how few is too few. The designers of these hash functions were
probably choosing a number of cycles per byte to make their function
competitive with SHA-256, and with other SHA-3 candidates, while not
accidentally losing collision-resistance like so many of their predecessors
had.


+------------+------+-----+-----+---------------------------------------------------+---------------------------------------------------+
| hash       | year | bit | cpb | (2nd-)preimage attacks                        
    | collision attacks                                 |
|            |      |     |     
+------+-----+---------+------------+---------------+------+-----+---------+------------+---------------+
|            |      |     |     | comp | mem | ??? brute | possible?  | 
reference     | comp | mem | ??? brute | possible?  | reference     |
+============+======+=====+=====+======+=====+=========+============+===============+======+=====+=========+============+===============+
| Skein      | 2010 | 256 |   7 | ---  | --- | ---     | ---        | ---       
    | ---  | --- | ---     | ---        | ---           |
+------------+------+-----+-----+------+-----+---------+------------+---------------+------+-----+---------+------------+---------------+
| Blake      | 2010 | 256 |   8 | ---  | --- | ---     | ---        | ---       
    | ---  | --- | ---     | ---        | ---           |
+------------+------+-----+-----+------+-----+---------+------------+---------------+------+-----+---------+------------+---------------+
| Gr??stl     | 2010 | 256 |  10 | ---  | --- | ---     | ---        | ---      
     | ---  | --- | ---     | ---        | ---           |
+------------+------+-----+-----+------+-----+---------+------------+---------------+------+-----+---------+------------+---------------+
| Keccak     | 2010 | 256 |  12 | ---  | --- | ---     | ---        | ---       
    | ---  | --- | ---     | ---        | ---           |
+------------+------+-----+-----+------+-----+---------+------------+---------------+------+-----+---------+------------+---------------+
| JH         | 2010 | 256 |  14 | ---  | --- | ---     | ---        | ---       
    | ---  | --- | ---     | ---        | ---           |
+------------+------+-----+-----+------+-----+---------+------------+---------------+------+-----+---------+------------+---------------+

If you are aware of any other papers which fit these criteria, or if you spot
an error in this document, please write to me: zo...@leastauthority.com.

[*] Be very careful about this???the ability to generate collisions can be
surprisingly harmful to some systems. This is one of those subtleties of
cryptographic engineering which frequently trip up engineers who are not
cryptography experts. The famous "Internet Root Cert" attack [Sotirov-2009_ ]
is an example of engineers incorrectly thinking that their system was not
threatened by collisions absent 2nd-pre-images.

Thanks to Daira Hopwood for comments on this note.

.. _Snefru: http://www.springerlink.com/content/t10683l407363633/
.. _Van_Rompay-2003: 
http://academic.research.microsoft.com/Publication/676305/cryptanalysis-of-3pass-haval
.. _Bernstein-2005: http://cr.yp.to/papers.html#bruteforce
.. _Wang-2005a: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.4759
.. _Wang-2005b: 
http://people.csail.mit.edu/yiqun/SHA1AttackProceedingVersion.pdf
.. _Yu-2006: http://www.springerlink.com/content/0n9018738x721090/
.. _Naito-2006: http://www.springerlink.com/content/v6526284mu858v37/
.. _Stevens-2007: 
http://marc-stevens.nl/research/papers/MTh%20Marc%20Stevens%20-%20On%20Collisions%20for%20MD5.pdf
.. _Knudsen-2007: http://www.springerlink.com/content/qn746388035614r1/
.. _Biham-2008: http://www.springerlink.com/content/208q118x13181g32/
.. _Sasaki-2008: http://www.springerlink.com/content/d382324nl16251pp/
.. _Mendel-2008: http://www.cosic.esat.kuleuven.be/publications/article-2091.pdf
.. _Manuel-2008: http://www.springerlink.com/content/3810jp9730369045/
.. _Leurent-2008: http://www.di.ens.fr/~leurent/files/MD4_FSE08.pdf
.. _Sasaki-2009: http://www.springerlink.com/content/d7pm142n58853467/
.. _Sotirov-2009: http://www.win.tue.nl/hashclash/rogue-ca/
.. _Zhong-2010: http://eprint.iacr.org/2010/583
.. _Guo-2010: http://eprint.iacr.org/2010/016
.. _Buchmann-2011: http://eprint.iacr.org/2011/484
.. _SHA-3-Zoo: http://ehash.iaik.tugraz.at/wiki/The_SHA-3_Zoo
.. _amd64-sandy0: http://bench.cr.yp.to/results-hash.html#amd64-sandy0
_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to