On 11/20/14 5:30 PM, Jerry Quinn wrote:
Steven Schveighoffer <[email protected]> writes:
On 11/18/14 9:46 PM, deadalnix wrote:
After all, unless your hash table is very small, the first hit is
most likely a cache miss, meaning ~300 cycles. at this point the
cache line is in L1, with a 3 cycle access time. So accessing
several element in the same cache line is often preferable.
In the case of AAs, all bucket elements are pointers, so this point is moot
for our purposes -- one may have to jump outside the cache to find the first
element. A good improvement (this is likely to come with a full library type)
is to instead store an inline element for each bucket entry instead of just a
pointer to an element. I recall when writing dcollections, this added a
significant speedup.
This works nicely for small types, but has gotchas. For example, if
you've got an AA of ints, what value indicates that this is a value
folded into the bucket entry vs actually being a pointer? You'll need
extra space to make sure it's safe. Alignment is another concern.
Let's say you have bool for the entry/element flag. With ints you've
doubled the size of the bucket array.
Hm.. the bucket entry will simply be what a node is now. You are
actually saving space, because you don't need to store that extra
pointer to the first node.
Where it gets dicey is when you rehash. Some of the nodes will be moved
into the bucket space, some will move out of it. But I don't think it's
a big deal, as a rehash doesn't happen very often.
If you have large objects as the elements, you'll waste space. Probably
the implementation needs to be switchable depending on the size of the object.
Yes, this is also a concern. This almost necessitates a minimum load as
well as a maximum.
As an aside, I find it invaluable to have statistics about hash table
buckets, ala C++11. When building custom hash functions and hash table
implementations, it's almost necessary.
I'd like to see bucket stats available in some manner for built-in AAs.
If they're going away for a library type, we should have such stat info
as part of the API.
Absolutely all of this could be added to a library type.
-Steve