I wrote: "most lookups are failed lookups", but Oops! 291e3 unique/4.8e6 total = 6% ==> 94% of lookups are successful. So, the `memcmp` that happens only after hash codes match does happen almost all the time, and so my particular data set is more like 16B/(32B+9B) = 39% cached, not 50%. This doesn't really alter the rest of my analysis, though since 39% and 50% are pretty close.
A corollary of this is that (for that style of data set), Robin Hood hashing w/linear probing would not help. RH can help for failed lookups because probe sequences for missing items become about 50% as long (once the search hash code is > table hash code you know it's missing and where to put the new key). It costs a little work { O(1.15x) } to keep entries along probe sequences sorted by hash code. So, RH only pays off if you care about worst case times or have a workload dominated by failed lookups over long probe sequences. At 56% table load (291./512) the sequences aren't long unless the hash function is weak relative to the data (not the case for me unless all 4 functions were equally weak). Anyway, RH would be about the only algorithmic speed-up I think could help in the poorly cached large table case, but only for very low hit rate data. It would not be crazy to add a `defined(useRobinHood)` in the Nim hash table impl (or even have a per-instance run-time flag) since whether RH helps or hurts is so dependent upon workload.