>>>>> "PN" == Palit, Nilanjan <[EMAIL PROTECTED]> writes:
PN> Now, regarding Tom Metro's original suggestion for using an MD5 Digest: PN> I read that the original MD5 algorithm has known issues with collisions. PN> Any experiences with how well Digest::MD5 does when used with many PN> millions of keys? Do I need to test for collisions myself (at the PN> expense of lost performance), or is it pretty well tested (or proved?) PN> to stand up to an intensive application? theoretically you can have colliding md5 hashes so i wouldn't rely upon it. but you could do a two level hash with the top level being the md5 keys and then you would either hash or use a list (needs searching then) to find the real key and its value. so you have a 1 time cost of md5 each time you do a hash insert/lookup but a much faster hash lookup itself. if you have so few entries in each level (as i seem to recall you were saying), then why not do a linear (or binary) search? what is good is that the string compares will likely fail quickly so you save all that time. hash lookups must compute a hash on the whole long key and then do string compares to verify a hit. if your keys are all different lengths then you can take advantage of the length compare check to make it faster. but that is cheating as you ar taking advantage of an implementation and not the true semantics of hashes. so specify your data and its use better. how many are there in each list to be hashed? uri -- Uri Guttman ------ [EMAIL PROTECTED] -------- http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

