On Sat, 17 Jan 2004, Volker Kuhlmann wrote: ... > It's a hash, and you can see easily that any number of bytes of input > are transformed into 32 bytes of output. From this one can conclude > that there have to be different files (of possibly vastly different > size?) which transform into the same hash value. Think "number" space.
Yes, that is what I was referring to. But for files of size greater than the length of the md5sum, there have to be different files of the same size generating the same md5sum. Take, e.g., files of size 33 bytes. Then you will have at least 256 times as many different files as you have different md5sum's. So there must be at least one pair of different files of size 33 with identical md5sum. And so on... the longer the files, the more of them have to have identical md5sums. But practically this should not be much of a problem, because the files we use are normally useful and thus have a certain format, so that they have common parts which do not differ. Hence, with increasing file size, we are practically exploiting decreasingly small percentages of the available file space. Example for this hypothesis: how many different one-byte files would be practically used? I guess 256. How man different two-byte files? 65536. How many different files of size 1 GB? Well, theoretically there could be 256^(10^9), but this number is possibly greater than the number of atoms on Earth - could someone check, please? But there are files of this size. So we can only use a really small fraction of the whole space (at a time). As long as the total number of files we are looking at is well below the number of different md5sum's, you would really have to be (un-)lucky to see two different files with the same md5sum. But it _can_ happen! > The property that only a small change of input results in a large > change of output makes it interesting for checksumming, because it > reduces the probability that 2 similar bit errors in the data cancel > each other out, i.e. produce the same hash. That's what I understand > anyway. Yes, they call it "confusion" and "diffusion"... Kind regards, Helmut. +----------------+ | Helmut Walle | | [EMAIL PROTECTED] | | 03 - 388 39 54 | +----------------+
