Dear Dormando, Regards "binding one memcached instance per NUMA node", should we understand "NUMA node" as a core with Intel i3/i5 4-core processors?
So " numactl --cpunodebind=0 ./memcached -m 4000 -t 4" will bind memcached instance to a CPU core, right? Thanks again! On Tuesday, April 17, 2012 8:56:31 AM UTC+8, Dormando wrote: > > > The business scenario requires: > > > > 50M key-value pairs, 2K each , 100G memory in total. > > > > About 40% of key-value will change in a second. > > > > The Java application need Get() once and set() once for each changed > pair, it will be 50M*40%*2=4M qps (query per second) . > > > > We tested memcached - which shows very limited qps. > > Our benchmarking is very similar to results showed herehttp:// > xmemcached.googlecode.com/svn/trunk/benchmark/benchmark.html > > > > 10,000 around qps is the limitation of one memcached server. > > > > That mean we need 40 partitioned memcached servers in our business > scenario- which seems very uneconomic and unrealistic. > > > > In your experience, is the benchmarking accurate in term of memcached’s > designed performance? > > > > Any suggestion to tune memcached system(client or server)? > > > > Or any other alternative memory store system that is able meet the > requirement more economically? > > > > Many thanks in advance! > > You should share your actual benchmark code. Also, what version of > memcached, OS, network, etc? > > After 1.4.10, a single memcached instance can do nearly one million sets > per second: > > http://groups.google.com/group/memcached/browse_thread/thread/972a4cf1f2c1b017/b3aaf416639e81a6 > > There are a lot of things you need to tune to get that level of > performance in a real scenario, however: > > - fast network. you will be limited by your packets per second. a single > gige nic might not do more than 600,000 per second, but also could be as > low as 250,000 before packet loss. > > - batch as many commands as you can (using binary protocol, with > "noreply"). fewer round trips, fewer packets on the wire. > > - use as many clients as you can (a single connection doing synchronous > sets will be slow in *any* benchmark) > > - as noted in the above link, binding one memcached instance per NUMA node > can improve performance > > - tune the number of threads correctly > > - always use the latest version > > performance should continue to improve over the coming months, but it's > very difficult to see results of the improvements on actual hardware. I'd > say you'd need 10 half decent servers to achieve that level of performance > and have good headroom. If you really tune things hard you could get that > down to 6. If you left me alone in a room for a few months with a giant > pile of money I could do it with two. three for redundancy. > > -Dormando > >
