On Jan 17, 2004, at 11:18 PM, Helmut Walle wrote:
On Sat, 17 Jan 2004, Volker Kuhlmann wrote:
It's a hash, and you can see easily that any number of bytes of input
are transformed into 32 bytes of output. From this one can conclude
that there have to be different files (of possibly vastly different
size?) which transform into the same hash value. Think "number" space.
Yes, that is what I was referring to. But for files of size greater
than the length of the md5sum, there have to be different files of the
same size generating the same md5sum. Take, e.g., files of size 33

Absolutely correct, this is the "collision" aspect of hash generation, which is better described by CS students I think :-) It's an important effect in index generation and things like that. However, because the chances of collisions in MD5 are "very very small", many people treat them as impossible.


Take a quick look at http://www.jlcooke.ca/psearch/aboutmd5/ , a project which aims to test MD5 by trying to generate the same hash from different inputs (of non-trivial size).

-jim



Reply via email to