On Sat, 17 Jan 2004, Volker Kuhlmann wrote:It's a hash, and you can see easily that any number of bytes of input are transformed into 32 bytes of output. From this one can conclude that there have to be different files (of possibly vastly different size?) which transform into the same hash value. Think "number" space.Yes, that is what I was referring to. But for files of size greater than the length of the md5sum, there have to be different files of the same size generating the same md5sum. Take, e.g., files of size 33
Absolutely correct, this is the "collision" aspect of hash generation, which is better described by CS students I think :-) It's an important effect in index generation and things like that. However, because the chances of collisions in MD5 are "very very small", many people treat them as impossible.
Take a quick look at http://www.jlcooke.ca/psearch/aboutmd5/ , a project which aims to test MD5 by trying to generate the same hash from different inputs (of non-trivial size).
-jim
