[trimmed cc list, nobody wants to read this noise]

On Sat, Apr 16, 2005 at 11:35:39PM +0200, Brian O'Mahoney wrote:
> >> (1) I _have_ seen real-life collisions with MD5, in the context of
> >>    Document management systems containing ~10^6 ms-WORD documents.
> > 
> > Dude!  You could have been *famous*!  Why the
> > aitch-ee-double-hockey-sticks didn't you publish this when you found it?
> > Seriously, man.
> 
> The MD5 has was fine, or at least the code (a) produced the correct
> results on the published test cases, and, (b) was properly applied to
> all bytes of the file(s). I was surprised when it happened, which is why
> I bothered to post to this list at this time, so I make two more points

OK, I guess it's time for some remedial math.

There are 2^128 = 340282366920938463463374607431768211456 different MD5
hashes.

You are suggesting that you found a collision using ~1e6 = ~1,000,000
plaintexts.

Let's suppose there were actually 100,000,000 = 1e8 plaintexts, just in
case you underestimated the number.

Applying the birthday paradox, we have a 50% probability that you'd find
one collision if there were ~7,213,475,309,916,173 possible hash values.
If you extend the birthday argument ("what is the probability of a
collision if you take N samples from a set of size M?") you get the
following results, with N = 1e8:

50% (1 in 2) probability of collision in           7,213,475,309,916,173.
1% (1 in 100) probability of collision in        497,496,027,172,833,194.
.05% (1 in 1845) probability of collision in   9,223,372,036,854,775,806.

That's where my quick-and-dirty solver craps out, but we're still a
really long ways from

                     340,282,366,920,938,463,463,374,607,431,768,211,456.

A simple linear extrapolation (certainly wrong, but not by more than a
few dozen orders of magnitude) says that the odds would be
1 in 68,056,473,384,187,692,692 for the full MD5 hash (I'm not even
going to dignify that with a percentage).

I'm not going to do the sums, but I would hazard a guess that it's more
likely your PC suffered a cosmic-ray-induced memory fault - EACH OF THE
FOUR TIMES YOU TESTED IT - causing it to report the same MD5, than that
you actually discovered a collision with a measly million (or even
hundred million) plaintexts.

(Of course, I don't know how many tests of the hash you actually did.
But the point stands.)

Hell, if you're *that* lucky, what are you doing in IT?  You could be
making a killing at the roulette table.

Or, even more likely, there was some other factor in the system (most
likely that it was only using a few bits, probably 32, of the hash
when looking for collisions) that resulted in a false alarm.

If you had actual evidence of a collision, I'd love to see it - even if
it's just the equivalent of
% md5 foo
d3b07384d113edec49eaa6238ad5ff00 foo
% md5 bar
d3b07384d113edec49eaa6238ad5ff00 bar
% cmp foo bar
foo bar differ: byte 25, line 1
%

But in the absence of actual evidence, we have to assume (just based on
the probabilities) that there was some error in your testing.

-andy
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to