On 23-3-2010 1:51, Andrei Alexandrescu wrote: > Currently, each entry in a D hashtable stores the hash of the key for > efficiency purposes. > > There is a bit of redundancy: as soon as you entered a hash bucket you > know that the hashkey % tablesize is a specific number, call it k. But k > is not enough information to restore the hash, so the actual hash gets > stored for each entry. > > I'm thinking of exploiting that redundancy by storing the hash > multiplier, not the hash value. Instead of storing h in each slot, store > m = h / tablesize. Then you can restore the actual h by writing: > > restored_h = m * tablesize + k; > > k is known for each given bucket (it's the index of the bucket) and m is > what gets stored per entry. > > What's the advantage of this approach? Well the advantage is that m is a > small number. Any hash function will try to disperse the hash value as > much as possible between the 32 available bits. But m will be a smaller > number, and therefore will be more grouped and will engender fewer false > pointers. > > Would this help?
Probably not. hash is an 'int' and this 'm' will still use up the same space as an int, I reckon. Even if it ends up being 16-bit and can be put in a short, alignment will probably cause it to consume 32-bits anyway. What's more the 16-bit access (not to mention the multiplier) will slow the AA down. I say, leave it be. L.
