In message <000001bdfbcb$c5ba5da0$[EMAIL PROTECTED]> you write:
> > From: Lee Iverson [mailto:[EMAIL PROTECTED]]
> >
> > Fair enough, but how and where is the index into the hash table
> > determined? If it's just a truncation (val%table_size) then you
> > should see real problems with object locality causing hash table
> > conflicts (i.e. linked lists with non-unit length). Knuth recommends
> > a 32-bit unsigned multiply by 2654435769U (a very simple
> > pseudo-randomizer) before the modulus to table size.
> >
>
> I think Classpath just does a truncation. It doesn't matter what you do, if
> your table is 100 entries wide and you have 101 objects, you're going to
> have "conflicts." If there was a "favoritism" for a single slot in the hash
> table, then there would be a problem. But none of the algorithms I've come
> across have that favoritism (this includes hash code), hence no need for
> randomization.
But that's not the issue. The issue is whether or not the probability
of getting an empty hash slot for a new entry `P(e)' is approximately equal
to the objects/size fraction `P(o)'. An example where this is not
going to happen is if the hash table size is a multiple of 2 and all
hash values pointers to aligned data. The odd indices will never be
hit and we'll end up with P(e) ~= P(o)/2.
Knuth suggests (and he's still right most of the time) that a good
tradeoff between all of the instruction timing and algorithm issues is
to use power of 2 hash table sizes (so the division operation is just
a shift) and then to randomize the hash with (2654435769U *
(u_int32_t)(h)). This tends to keep P(e)=P(o) and avoids the (very
expensive on some architectures) integer division operation.
Perhaps not a big issue if the amount of hashing done is minimal, but
important in a lot of applications.
-------------------------------------------------------------------------------
Lee Iverson SRI International
[EMAIL PROTECTED] 333 Ravenswood Ave., Menlo Park CA 94025
http://www.ai.sri.com/~leei/ (650) 859-3307