On Tuesday, 28 February 2017 at 21:00:05 UTC, H. S. Teoh wrote:
On Tue, Feb 28, 2017 at 12:57:14PM -0500, Andrei Alexandrescu
via Digitalmars-d wrote:
This is of possible interest:
https://probablydance.com/2017/02/26/i-wrote-the-fastest-hashtable/ --
Related to this, recently I found some interesting papers for
large external (i.e., on-disk) hash tables:
http://www.itu.dk/people/pagh/papers/linear.pdf
http://www.weizhewei.com/papers/pods020-wei.pdf
AIUI, blocked probing (1st paper) is similar to Robin Hood
hashing, in that inserting an entry may cause an existing entry
to be moved out of its occupied slot to a different one, but
blocked probing also has additional interesting characteristics:
- Scanning is done in blocks of size 2^i with starting slots of
index
2^i for incrementing i. This structure makes it
cache-oblivious, and
thus able to take advantage of the modern cache hierarchy
without
needing cache-specific tuning.
- Something special about the 2^i block sizes makes the math
just work
out (see 2nd paper), so that lookups have expected average 1 +
1/2^Ω(b) I/O operations, where b is the block size (provided
the load
factor is bounded away from 1), which is a marked improvement
over
plain linear probing which has expected 1 + O(α/b) I/O
operations,
where α is the load factor.
I didn't look closely enough at the analysis to know for sure,
but it seems that since the analysis is cache-oblivious, the
O(1 + 1/2^Ω(b)) I/O operations should also generalize to cache
misses as well (as part of the memory hierarchy, if you regard
secondary storage as the lowest level of the hierarchy). So
I'm expecting this might be even faster than the Robin Hood
hashing in your linked blog.
T
I liked that article. I didn't really understand the point about
implementation of modulo primes, maybe I missed something. Given
that our man is doing modulo a 'known' value (he had a switch
statement to get to them), why not do something rather cheaper
than a compiler-expanded constant div/mod made up of multiplies
and shifts
const uint power2 = 512; // say, some 1 << n anyway
const uint prime = 509; // some prime just below the power,
some prime > power2/2
static assert( power2 - 1 - prime < prime );
x = x & ( power2 - 1 );
x = ( x >= prime ) ? x - prime : x;
which is good news on my x86 with GDC -O3 (only 3 operations, and
sub cmovx ) - all well provided you make sure that you are
getting CMOVx not branches. I could work out the power from the
prime using CTFE given a bit of thought. Maybe CTFE could even do
the reverse?
Have I finally gone mad?