On 11/18/14 9:46 PM, deadalnix wrote:
On Wednesday, 19 November 2014 at 02:16:36 UTC, Steven
Schveighoffer wrote:
From all I remember about hash tables, you are supposed to minimize
the number of collisions when adding/looking up data. This helps keep
the O(1) lookup property intact.


That is the theory, but the real world is sometime different. It
depends on how you resolve collisions. If you do it via some kind
of linked list, then yes, you must remove all collisions.

This is what AAs currently do.

But if you do it by putting the element in a slot close to its
actual position, then thing gets different. In that case, you
don't want to minimize collision, but to minimize the distance to
the object, in order for it to be in the cache line.

Heh, a load factor of 4 wouldn't work too well there :)

I do recall reading about all the different cache miss handling. The wikipedia article I referenced has a lot of good info.


After all, unless your hash table is very small, the first hit is
most likely a cache miss, meaning ~300 cycles. at this point the
cache line is in L1, with a 3 cycle access time. So accessing
several element in the same cache line is often preferable.

In the case of AAs, all bucket elements are pointers, so this point is moot for our purposes -- one may have to jump outside the cache to find the first element. A good improvement (this is likely to come with a full library type) is to instead store an inline element for each bucket entry instead of just a pointer to an element. I recall when writing dcollections, this added a significant speedup.

-Steve

Reply via email to