Sean,

This is actually a bit tricky, because when retrieving or putting
values in the open table (or map) you must not stop on removed keys
and continue until you either find an empty slot, or a key different
then the one you are looking for.

The test case is like this here: imagine you have three keys, k1, k2
and k3 and ALL of them have hash collision on the same slot. This
makes a chain of lookups, for instance:

slot1 K1
slot2 K2
slot3 K3
slot4 EMPTY

if you remove K2, this becomes:

slot1 K1
slot2 REMOVED
slot3 K3
slot4 EMPTY

now, retrieving the value of K3 will return junk in the implementation
you suggest below... which is also a bug in the current implementation
of FastMap, I believe. I think (ehm) I implemented it properly in
HPPC, you can peek in there if you want to.

As long as you don't mix removes with adds, your implementation runs fine.

Dawid


On Wed, Jan 27, 2010 at 12:32 PM, Sean Owen <sro...@gmail.com> wrote:
> FWIW this is also the implementation I used, also nicked from Knuth
> (with credit), in my "Fast" set and map implementations. I also
> believe it to be correct.
>
>  private int find(long key) {
>    int theHashCode = (int) key & 0x7FFFFFFF; // make sure it's positive
>    long[] keys = this.keys;
>    int hashSize = keys.length;
>    int jump = 1 + theHashCode % (hashSize - 2);
>    int index = theHashCode % hashSize;
>    long currentKey = keys[index];
>    while (currentKey != NULL && (currentKey == REMOVED || key != currentKey)) 
> {
>      if (index < jump) {
>        index += hashSize - jump;
>      } else {
>        index -= jump;
>      }
>      currentKey = keys[index];
>    }
>    return index;
>  }
>
>
> On Wed, Jan 27, 2010 at 8:38 AM, Dawid Weiss <dawid.we...@gmail.com> wrote:
>> The implementation in Colt is correct, it is double addressing, the
>> value of the second hash is always relatively prime to the first one
>> (and must not be zero). The colt's implementation can be rewritten as:
>>
>> const_increment = 1 + h % (m - 2);
>>
>> if you do a loop
>>
>> while (true)
>> {
>>  slot = (slot + const_increment) % m;
>> }
>>
>> you will eventually iterate all slots between 0...m-1, no matter what
>> const_increment's value was chosen. As far as I remember, Knuth also
>> suggests to pick h2 (the second hash function) as a function of the
>> first hash function, don't have the book with me right now though.
>>
>> D.
>>
>

Reply via email to