I think there are both issues with your benchmark that make results less clear 
than you think {eg. you do not do PGO with a fast hash function - that shrinks 
ratios 3x for me, some operation time ratios seem suspicious {why is del 4x not 
2x slower than write? better prefetching? PRNGs do not actually block 
prefetching}, and you do not report memory usage/overhead of memory-optimized 
ditab vs spacious SparseSet} and also optimizations of your SparseSet that you 
missed {eg. `clear` can be fixed cost even for 20 MiEntry, as your 
geeksforgeeks link & you both mentioned}.

Just as one data point, though, for your 
    
    
    initTable 20971520 del                         4.91s  203.81m
    initDITab 20971520 del            177.68%      2.76s  362.12m
    initSparseSet 20971520 del        466.64%      1.05s  951.05m
    
    Run

with just that PGO/faster hash in a few minutes I got almost 3x better perf 
ratio: 
    
    
    initTable 20971520 del                      932.34ms     1.07
    initDITab 20971520 del             93.50%   997.21ms     1.00
    initSparseSet 20971520 del        171.33%   544.18ms     1.84
    
    
    Run

Notice how, as I alluded to, Table is not _that_ much slower out of cache { and 
space-optimized ditab even slower, as alluded to, though I would love if 
someone could speed up `sequint` :-) }. 1.7x may matter or re-pay for itself in 
memory savings to make something fit in RAM/L3/L2/L1 that would not otherwise, 
etc. This is also just one pair of CPUs, my Skylake and whatever @jackhftang 
used which may also matter.

I am not sure even the above numbers are meaningful, though it seems best to 
just raise methodological issues over at your linked github repo where 
interested readers can follow if they care, as it is likely to get even more 
into the detail weeds {and already uncovered at least one bug of mine. Oops! 
:-) }. I will do that later today/this weekend.

It should be uncontroversial, though, that any one set of benchmarks like this 
is unlikely to decide the question "in context", especially not measuring 
memory use. Maybe the use case wants 20,000 tables of 20,000 entries or the key 
address space is so much easier to loosely bound than to tightly bound that the 
space optimization has more power. {Direct indexing/the sparse array is 
particularly vulnerable to this.} This is perhaps covered by your 
"functionalities, etc.", but it relates to the general 
an-application-itself-is-the-best-benchmark property which **_always_** bears 
reiteration.

So, I'm all for people doing their own data structs as needed - Nim makes that 
super easy. Hooray! :-) But I _also_ agree with @jackhftang's [earlier 
comment](https://forum.nim-lang.org/t/7236#45750) that its best to do APIs/code 
structure first and then profile to see what needs work (a profile is usually 
better than microbenchmarks) rather than just assume Table will have a problem, 
as the linked YouTube video does in its motivation (before also naively 
overselling Bloom filters, but that is some whole other spiel). My later 
comments are mostly all just in support of that since it sounded like @b3liever 
was [still figuring out what he even 
needed/wanted](https://forum.nim-lang.org/t/7236#45749). Cheers, all.

Reply via email to