The performance regression in question was a known and not mysterious very 
temporary fix until the weak integer hash function was replaced with a stronger 
one (the original problem). In that sense, a complaint of a performance 
regression across that commit is a red herring for anything _except_ as an 
example that automated detection might have caught (which is what I took 
@timothee's comment to be mostly about).

Relying on hopping around memory to build a strong hash function is a bad idea 
because of memory latency. It is easy to fool yourself to believe otherwise, 
especially if you "mis-estimate" (by up to 10x) lookup latency as the 
"reciprocal throughput" of some hot loop, uncontended cache benchmark. Instead, 
you can just use a strong hash in the first place and incur (almost always for 
small objects) just one cache miss which is what Nim now does and had has done 
since shortly after the mentioned "commit regression".

While the pseudo-random probing idea is fundamentally bad from a lookup latency 
point of view, the "compact ordered" idea of Python's tables is fine for 
purposes where insert-order matters. To answer your question, that is 
represented in the `lptabz` module of <https://github.com/c-blake/adix> as well 
as <https://github.com/LemonBoy/compactdict>.

Reply via email to