Sean, This is actually a bit tricky, because when retrieving or putting values in the open table (or map) you must not stop on removed keys and continue until you either find an empty slot, or a key different then the one you are looking for.
The test case is like this here: imagine you have three keys, k1, k2 and k3 and ALL of them have hash collision on the same slot. This makes a chain of lookups, for instance: slot1 K1 slot2 K2 slot3 K3 slot4 EMPTY if you remove K2, this becomes: slot1 K1 slot2 REMOVED slot3 K3 slot4 EMPTY now, retrieving the value of K3 will return junk in the implementation you suggest below... which is also a bug in the current implementation of FastMap, I believe. I think (ehm) I implemented it properly in HPPC, you can peek in there if you want to. As long as you don't mix removes with adds, your implementation runs fine. Dawid On Wed, Jan 27, 2010 at 12:32 PM, Sean Owen <sro...@gmail.com> wrote: > FWIW this is also the implementation I used, also nicked from Knuth > (with credit), in my "Fast" set and map implementations. I also > believe it to be correct. > > private int find(long key) { > int theHashCode = (int) key & 0x7FFFFFFF; // make sure it's positive > long[] keys = this.keys; > int hashSize = keys.length; > int jump = 1 + theHashCode % (hashSize - 2); > int index = theHashCode % hashSize; > long currentKey = keys[index]; > while (currentKey != NULL && (currentKey == REMOVED || key != currentKey)) > { > if (index < jump) { > index += hashSize - jump; > } else { > index -= jump; > } > currentKey = keys[index]; > } > return index; > } > > > On Wed, Jan 27, 2010 at 8:38 AM, Dawid Weiss <dawid.we...@gmail.com> wrote: >> The implementation in Colt is correct, it is double addressing, the >> value of the second hash is always relatively prime to the first one >> (and must not be zero). The colt's implementation can be rewritten as: >> >> const_increment = 1 + h % (m - 2); >> >> if you do a loop >> >> while (true) >> { >> slot = (slot + const_increment) % m; >> } >> >> you will eventually iterate all slots between 0...m-1, no matter what >> const_increment's value was chosen. As far as I remember, Knuth also >> suggests to pick h2 (the second hash function) as a function of the >> first hash function, don't have the book with me right now though. >> >> D. >> >