> I've been trying to opimise the performance of memcached (1.6) on 10G > Ethernet and in doing so have created a series of patches that enable > it to scale to link speed. > > Presently, memcache is unable to provide more than 450K > transactions-per-second (TPS) (as measured with Membase's memslap > benchmark on a set of Solarflare SFC 9020 NICs) with the kernel TCP/IP > stack and about 600K TPS with Solarflare's OpenOnload TCP/IP stack. > With the patches it scales to about 850K TPS with the kernel TCP/IP > stack and 1100K TPS with Solarflare's OpenOnload TCP/IP stack, as > illustrated in the graph at http://www.cl.cam.ac.uk/~rss39/mm_comp.pdf > > I have tried to keep the changes as self-contained and small as > possible and have tested them as extensively as I can, but I look > forward to your feedback and comments on the set.
Thanks for open sourcing this! And thanks for attempting to keep the changes small and documenting each patch. That's a big help. It'll probably be a while before any of us can verify or adopt these patches, but it's good to have them out there. I can give you some quick feedback which will also help the process; Most of your changes are in the default_engine/, a large part of the point of 1.6 is so we can "fork" this engine and modify it. At a glance, I see that you've added the 32bit hash into the item structure. I'm sad to say that almost all users of memcached care about its memory efficiency more than the vertical scalability, and 4 bytes per item can be horrendous to some workloads. That can probably be worked on, but to start with I would recommend you actually fork the default_engine and port your changes into that. ie -> copy default_engine tree to lockscale_engine (or whatever) -> port your patches onto that -> isolate the patches which touch the main tree ... then we can decide on if we want to distribute both engines and give users a choice, or keep one in the repo and slowly adopt the scaling changes from one to the other (if possible). Thanks, -Dormando
