On Tuesday, 17 February 2015 03:38:55 UTC+7, Dormando wrote:
>
>
> Again, in actual benchmarks I've not been able to prove them to be a 
> problem. In one of the gist links I provided before I show an "all miss" 
> case, which acquires/releases a bucket lock but does not have the overhead 
> of processing the value. In those cases it was able to process over 50 
> million keys per second. The internals don't tend to be the slow part 
> anymore.
>
Seems that that was with 32 threads. Hence throughput per thread is 1.5 mln 
keys/sec, very roughtly.
It means 600-700 ns / op latency. (Or about 300 ns, if that was with 16 
threads).
Maybe it's not the major part of total thousands of ns average  Memcached op
takes now, but this is considerable amount to optimize.
Unconctended spin lock should take only dozens of ns.
Also, if it just queries an empty table, cache is uncontended, memory of 
mutex structures is not evicted
as quickly as under normal conditions. So such test tends to show faster 
mutex ops than they are actually.


What about optimizing snprintf()?
- For slab class, we know the highest digit of the value length
- hand-written itoa()s instead snprintf()?
- Optimize / precompute most probable combinations of flags in decimal repr
- If values are constantly sized (or it is hardly the case for memcached?) 
or one of several size classes,
  ultra thin hash table with precomputed value lengths should help

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to