Re: Hashcode stuff ...

Lee Iverson Mon, 19 Oct 1998 20:07:08 -0400
In message <000001bdfbcb$c5ba5da0$[EMAIL PROTECTED]> you write:
> > From: Lee Iverson [mailto:[EMAIL PROTECTED]]
> >
> > Fair enough, but how and where is the index into the hash table
> > determined?  If it's just a truncation (val%table_size) then you
> > should see real problems with object locality causing hash table
> > conflicts (i.e. linked lists with non-unit length).  Knuth recommends
> > a 32-bit unsigned multiply by 2654435769U (a very simple
> > pseudo-randomizer) before the modulus to table size.
> >
> 
> I think Classpath just does a truncation.  It doesn't matter what you do, if
> your table is 100 entries wide and you have 101 objects, you're going to
> have "conflicts."  If there was a "favoritism" for a single slot in the hash
> table, then there would be a problem.  But none of the algorithms I've come
> across have that favoritism (this includes hash code), hence no need for
> randomization.

But that's not the issue.  The issue is whether or not the probability
of getting an empty hash slot for a new entry `P(e)' is approximately equal
to the objects/size fraction `P(o)'.  An example where this is not
going to happen is if the hash table size is a multiple of 2 and all
hash values pointers to aligned data.  The odd indices will never be
hit and we'll end up with P(e) ~= P(o)/2.

Knuth suggests (and he's still right most of the time) that a good
tradeoff between all of the instruction timing and algorithm issues is
to use power of 2 hash table sizes (so the division operation is just
a shift) and then to randomize the hash with (2654435769U *
(u_int32_t)(h)).  This tends to keep P(e)=P(o) and avoids the (very
expensive on some architectures) integer division operation.

Perhaps not a big issue if the amount of hashing done is minimal, but
important in a lot of applications.

-------------------------------------------------------------------------------
Lee Iverson                     SRI International
[EMAIL PROTECTED]                 333 Ravenswood Ave., Menlo Park CA 94025
http://www.ai.sri.com/~leei/    (650) 859-3307
Re: Hashcode stuff ...

Reply via email to