Re: AA rehash threshold

Steven Schveighoffer via Digitalmars-d Tue, 18 Nov 2014 18:57:00 -0800

On 11/18/14 9:46 PM, deadalnix wrote:

On Wednesday, 19 November 2014 at 02:16:36 UTC, Steven
Schveighoffer wrote:

From all I remember about hash tables, you are supposed to minimize
the number of collisions when adding/looking up data. This helps keep
the O(1) lookup property intact.


That is the theory, but the real world is sometime different. It
depends on how you resolve collisions. If you do it via some kind
of linked list, then yes, you must remove all collisions.


This is what AAs currently do.

But if you do it by putting the element in a slot close to its
actual position, then thing gets different. In that case, you
don't want to minimize collision, but to minimize the distance to
the object, in order for it to be in the cache line.


Heh, a load factor of 4 wouldn't work too well there :)

I do recall reading about all the different cache miss handling. Thewikipedia article I referenced has a lot of good info.


After all, unless your hash table is very small, the first hit is
most likely a cache miss, meaning ~300 cycles. at this point the
cache line is in L1, with a 3 cycle access time. So accessing
several element in the same cache line is often preferable.

In the case of AAs, all bucket elements are pointers, so this point ismoot for our purposes -- one may have to jump outside the cache to findthe first element. A good improvement (this is likely to come with afull library type) is to instead store an inline element for each bucketentry instead of just a pointer to an element. I recall when writingdcollections, this added a significant speedup.


-Steve

Re: AA rehash threshold

Reply via email to