Re: [Boston.pm] Max hash key length

Uri Guttman Wed, 29 Dec 2004 14:22:21 -0800

>>>>> "PN" == Palit, Nilanjan <[EMAIL PROTECTED]> writes:


  PN> Now, regarding Tom Metro's original suggestion for using an MD5 Digest:
  PN> I read that the original MD5 algorithm has known issues with collisions.
  PN> Any experiences with how well Digest::MD5 does when used with many
  PN> millions of keys? Do I need to test for collisions myself (at the
  PN> expense of lost performance), or is it pretty well tested (or proved?)
  PN> to stand up to an intensive application?

theoretically you can have colliding md5 hashes so i wouldn't rely upon
it. but you could do a two level hash with the top level being the md5
keys and then you would either hash or use a list (needs searching then)
to find the real key and its value. so you have a 1 time cost of md5
each time you do a hash insert/lookup but a much faster hash lookup
itself. if you have so few entries in each level (as i seem to recall
you were saying), then why not do a linear (or binary) search? what is
good is that the string compares will likely fail quickly so you save
all that time. hash lookups must compute a hash on the whole long key
and then do string compares to verify a hit.

if your keys are all different lengths then you can take advantage of
the length compare check to make it faster. but that is cheating as you
ar taking advantage of an implementation and not the true semantics of
hashes.

so specify your data and its use better. how many are there in each list
to be hashed?

uri

-- 
Uri Guttman  ------  [EMAIL PROTECTED]  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Max hash key length

Reply via email to