Ideal design for a general purpose SparseSet?

jackhftang Fri, 11 Dec 2020 05:45:12 -0800

First, I should clarify my stance. I am not interested in development of 
SparseSet, but I am happy if in the end this can lead to a better library. 
Also, the `SparseSet` is copied from @b3liever I failed to import his lib from 
nimble via url and I don't have time to fix it and I just copy the code with 
one modification that renaming `delete` to `del`.


The test is mean to be quick and dirty and incomplete (originally I just want 
to spend 1h on it, but overtime anyway). And the margin of common operations 
are already large enough that I have confidence to reject your claims like this

> The one other advantage you might see for direct indexing is the same average 
> vs worst case per-element time cost. That sounds a lot better than hash 
> table's expected worst case per element ~log(table size). But for big tables 
> of tiny objects you can fit many per cache line and that cache load dominates 
> lookup time. So, worst case random access for the hash table can be more like 
> 2x the time cost of the average (especially if you have Robin Hood re-org 
> activated), not log(N) as much. What's more, in the non-compact case, hashing 
> can achieve just 1 cache line fetch almost all the time, while direct 
> indexing will usually take 2 cache loads. adix/althashes.hashRoMu1() is also 
> so fast as to be almost free. So, depending upon scale/features it is easy to 
> imagine linearly probed hashing being up to 2x faster than direct indexing, 
> insignificantly more variable, and possibly much more memory efficient. I 
> think this is a situation where naive "big O" analysis can give misleading 
> expectations.

>From my experience in debate, there is a technique called **focus shift**. If 
>A is talking topic X, I can talk about topic Y; and when A follow the topic Y, 
>I would talk about topic Z... In the end, I can drain all the energy of A and 
>then the focus lost. In real world, a lot of politician are doing similar 
>things.

Back to our topic, you mentioned a lot about performance before and after I 
test about the speed; You talk about the memory usage; If I test about memory 
usage, would you talk about hash selections? If I test about hashes, would you 
talk about the key access patterns, the CPU architectures, the multi-level 
caches, the platforms, SMA, etc so many moving parts... In the end, the focus 
just lose, drain everyone energy, and no meaningful outcomes.

You asked about the context, the context should be interpreted in the most 
typical situation: out-of-the-box container used by ordinary users. You pointed 
out I have methodological issues. I am an open-minded guy to admit mistakes and 
I know the tests are not serious, but I need to see more data and tests to be 
convinced. If there are so many factors, I think speed and space are important 
metrics. I guess by focusing on speed/space (or just speed) this would lead to 
a more meaningful discussion.

Ideal design for a general purpose SparseSet?

Reply via email to