On 06/07/2012 12:20 PM, Dag Sverre Seljebotn wrote:
On 06/07/2012 12:26 AM, Robert Bradshaw wrote:
On Wed, Jun 6, 2012 at 2:36 PM, Dag Sverre Seljebotn
<d.s.seljeb...@astro.uio.no> wrote:
On 06/06/2012 11:16 PM, Robert Bradshaw wrote:
On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
<d.s.seljeb...@astro.uio.no> wrote:
On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
I just found http://cmph.sourceforge.net/ which looks quite
interesting. Though the resulting hash functions are supposedly
cheap,
I have the feeling that branching is considered cheap in this
context.
Actually, this lead was *very* promising. I believe the very first
reference I actually read through and didn't eliminate after the
abstract totally swept away our home-grown solutions!
"Hash& Displace" by Pagh (1999) is actually very simple, easy to
understand, and fast both for generation and (the branch-free)
lookup:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
The idea is:
- Find a hash `g(x)` to partition the keys into `b` groups (the paper
requires b> 2n, though I think in practice you can often get away
with
less)
- Find a hash `f(x)` such that f is 1:1 within each group (which is
easily achieved since groups only has a few elements)
- For each group, from largest to smallest: Find a displacement
`d[group]` so that `f(x) ^ d` doesn't cause collisions.
It requires extra storage for the displacement table. However, I
think 8
bits per element might suffice even for vtables of 512 or 1024 in
size.
Even with 16 bits it's rather negligible compared to the
minimum-128-bit
entries of the table.
I benchmarked these hash functions:
displace1: ((h>> r1) ^ d[h& 63])& m1
displace2: ((h>> r1) ^ d[h& m2])& m1
displace3: ((h>> r1) ^ d[(h>> r2)& m2])& m1
Only the third one is truly in the spirit of the algorithm, but I
think
the first two should work well too (and when h is known compile-time,
looking up d[h& 63] isn't harder than looking up r1 or m1).
My computer is acting up and all my numbers today are slower than the
earlier ones (yes, I've disabled turbo-mode in the BIOS for a year
ago,
and yes, I've pinned the CPU speed). But here's today's numbers,
compiled with -DIMHASH:
direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=2400000000.000000
index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=1800000000.000000
twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11
val=1800000000.000000
threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11
val=1800000000.000000
displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11
val=1800000000.000000
displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11
val=1800000000.000000
displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11
val=1800000000.000000
I did a dirty prototype of the table-finder as well and it works:
https://github.com/dagss/hashvtable/blob/master/pagh99.py
The paper obviously puts more effort on minimizing table size and
not a
fast
lookup. My hunch is that our choice should be
((h>> table.r) ^ table.d[h& m2])& m1
and use 8-bits d (because even if you have 1024 methods, you'd rather
double
the number of bins than those 2 extra bits available for displacement
options).
Then keep incrementing the size of d and the number of table slots (in
such
an order that the total vtable size is minimized) until success. In
practice
this should almost always just increase the size of d, and keep the
table
size at the lowest 2**k that fits the slots (even for 64 methods or
128
methods :-))
Essentially we avoid the shift in the argument to d[] by making d
larger.
Nice. I'm surprised that the indirection on d doesn't cost us much;
Well, table->d[const& const] compiles down to the same kind of code as
table->m1. I guess I'm surprised too that displace2 doesn't penalize.
hopefully its size wouldn't be a big issue either. What kinds of
densities were you achieving?
OK, simulation results just in (for the displace2 hash), and they
exceeded my expectations.
I always fill the table with n=2^k keys, and fix b = n (b means |d|).
Then the failure rates are (top two are 100,000 simulations, the rest
are 1000 simulations):
n= 8 b= 8 failure-rate=0.0019 try-mean=4.40 try-max=65
n= 16 b= 16 failure-rate=0.0008 try-mean=5.02 try-max=65
n= 32 b= 32 failure-rate=0.0000 try-mean=5.67 try-max=25
n= 64 b= 64 failure-rate=0.0000 try-mean=6.60 try-max=29
n= 128 b= 128 failure-rate=0.0000 try-mean=7.64 try-max=22
n= 256 b= 256 failure-rate=0.0000 try-mean=8.66 try-max=37
n= 512 b= 512 failure-rate=0.0000 try-mean=9.57 try-max=26
n=1024 b= 1024 failure-rate=0.0000 try-mean=10.66 try-max=34
Try-mean and try-max is how many r's needed to be tried before success,
so it gives an indication how much is left before failure.
For the ~1/1000 chance of failure for n=8 and n=16, we would proceed to
let b=2*n (100,000 simulations):
n= 8 b= 16 failure-rate=0.0001 try-mean=2.43 try-max=65
n= 16 b= 32 failure-rate=0.0000 try-mean=3.40 try-max=65
NOTE: The 512...2048 results were with 16 bits displacements, with 8 bit
displacements they mostly failed. So we either need to make each element
of d 16 bits, or, e.g., store 512 entries in a 1024-slot table (which
succeeded most of the time with 8 bit displacements). I'm +1 on 16 bits
displacements.
The algorithm is rather fast and concise:
https://github.com/dagss/hashvtable/blob/master/pagh99.py
The algorithm is designed for 100% density in the table itself. (We
can lift
that to compensate for a small space of possible hash functions I
guess.)
I haven't done proper simulations yet, but I just tried |vtable|=128,
|d|=128 from the command line and I had 15 successes or so before the
first
failure. That's with a 100% density in the vtable itself! (And when it
fails, you increase |d| to get your success).
The caveat is the space spent on d (it's small in comparison, but
that's why
this isn't too good to be true).
A disadvantage might be that we may no longer have the opportunity to
not
make the table size a power of two (i.e. replace the mask with "if
(likely(slot< n))"). I think for that to work one would need to
replace the
xor group with addition on Z_d.
Going back to the idea of linear probing on a cache miss, this has the
advantage that one can write a brain-dead provider that sets m=0 and
simply lists the methods instead of requiring a table optimizer. (Most
tools, of course, would do the table optimization.) It also lets you
get away with a "kind-of good" hash rather than requiring you search
until you find a (larger?) perfect one.
Well, given that we can have 100% density, and generating the table is
lightning fast, and the C code to generate the table is likely a 300
line
utility... I'm not convinced.
It goes from an extraordinary simple spec (table is, at minimum, a
func[2^k] with a couple of extra zero fields, whose struct can be
statically defined in the source by hand) to a, well, not complicated
in the absolute sense, but much more so than the definition above. It
also is variable-size which makes allocating it globally/on a stack a
pain (I suppose one can choose an upper bound for |d| and |vtable|).
I am a bit playing devil's advocate here, it's probably just a (minor)
con, but worth noting at least.
If you were willing to go the interning route, so that you didn't need
to fill the table with md5 hashes anyway, I'd say you'd have a stronger
point :-)
Here's a good reason to demand perfect hashing in the callee:
Suppose you want to first check the interface once, then keep using the
vtable -- e.g, *if* we want Cython to raise TypeError on the interface
coercion *and* we decide we don't want to mess with constructing
C++-style vtables on the fly, then code like this:
cdef f(SomeInterface obj):
return obj.some_method(1.0)
would simply expect that the vtable contained the method, and skip the
ID comparison entirely. No comparison is faster than either 64-bit hash
comparison and interned comparison. :-)
I'm not saying the above decisions must be made, but the possibility
seems reason enough to demand perfect hashing.
Dag
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel