Oh, and in the [adix](https://github.com/c-blake/adix) git history, I do have a 
pseudo-random probing impl along with one that has an idea by @timothee to 
start linear probing and only switch to pseudo-random after some number of 
cache lines. Those used to be called `otset.nim` and `ottab.nim` and more or 
less are exactly the Python 3.6 dict impl. I decided this whole general idea 
space was a distraction from better designs and just deleted it.

The Python algo may be ok for the _very_ limited "language-symbol-table" use 
cases Python mostly applies it to (where it really should not be doing run time 
hash lookups in the first place, IMO). It is just obviously bad for very large 
tables. It "doubles up" the already more indirect than necessary old Python 
pseudo-random design. Each step in the collision chain must fetch the index out 
of the hash-table part and then the hash code out of the real data part. A cold 
cache, poorly predicted lookup with a collision search depth of 4 could take 
8*60..120ns DIMM memory load latencies.

A simple way to improve this by 2x is in `lptabz` which is to store some of the 
hash code along with the index in the "hash-table-like part". Then if the hash 
table part uses linear probing this reduces to just 2 loads. This is 2x more 
than the 1 load "unordered" variant, but you get compactness & insertion-order 
in exchange (as well as optimal iteration over elements performance).

Reply via email to