On Wed, 29 Dec 2004 13:13:22 -0800, Palit, Nilanjan
<[EMAIL PROTECTED]> wrote:
> Folks,
> 
> Thanks for the good ideas & the performance discussion. I'll try out the
> different suggestions.
> 
> Now, regarding Tom Metro's original suggestion for using an MD5 Digest:
> I read that the original MD5 algorithm has known issues with collisions.
> Any experiences with how well Digest::MD5 does when used with many
> millions of keys? Do I need to test for collisions myself (at the
> expense of lost performance), or is it pretty well tested (or proved?)
> to stand up to an intensive application?

FYI, "known issues" means that we have a known way to produce
two files with the same MD5 hash that is faster than just looking for
one.

Under normal circumstances, to get non-miniscule odds of having
a collision somewhere between MD5 keys, you'd need about 2**64
keys.  If you have less than, say, a billion keys then you can ignore
that possibility for all practical intents and purposes.

That said, the suggestion of using MD5 keys is a non-starter for
eliminating the performance issue.  Calculating an MD5 hash of a
string of length n is O(n).  In fact _any_ decent hashing algorithm
is going to take time proportional to the length of the string because
if you try to take less time, then you have to skip parts of the string
and then you can't notice changes in the skipped part of the string.

Cheers,
Ben
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to