I'm not familiar with the source code, but I would suspect that the global lock is to maintain the linked list in LRU order. As every read mutates the list, a reader/writer lock isn't appropriate. If that's the case, you may want to consider trading the slightly better hit rate for more concurrency (such as by using the Second Chance FIFO algorithm) or adopting a CAS-based LRU algorithm to reduce the critical section on the tail lock.
________________________________ From: Toru Maesaka <[email protected]> To: [email protected] Sent: Thursday, March 5, 2009 7:11:09 PM Subject: Ideas on Concurrency Control Strategy Hi! I was going to reply to Dustin's thread but noticed we haven't talked about development related goodness in a while so I'm starting a new thread. So, to begin with I'm going to describe what I'm going to experiment to get memcached scaling even further in modern hardware. As most of you know, at the moment we globally lock the cache with pthread_mutex_lock() to control concurrency. This is working fine at the moment but taking into account that in most use cases, the majority of the requests are GET related, I have a feeling that using pthread_rwlock_wrlock() and pthread_rwlock_rdlock() might give us substantial performance benefits without changing the codebase too much. For those that aren't familiar with pthread's read-write locking, it's a really simple mechanism that allows multiple threads to fetch a lock for reading IF a write lock isn't obtained by another thread. This means our worker threads can perform GET related commands without being blocked (again, if a write lock isn't taken by someone else). So in brief, fetching data from memcached should scale quite well in modern SMP architecture with this scheme. The downside of pthread's read-write locking is that it consumes more CPU than simple locking, and perhaps people are already happy with the current performance of memcached. For the rest of us that wants more performance out of memcached, we could design the codebase to only use read-write locking when the server is started with more than certain number of threads (e.g. > 8 threads). As for the global stats_lock, I think its safe to assume that most of the operations are write related so read-write locking is definitely not the answer here. I think Facebook was dead-on with their solution on scaling this area up (use scoreboarding per thread basis and aggregate the results when needed). There are other locks that needs to be looked at but I haven't got that far yet so I'll stop now :) To begin with I'm going to only experiment with using rw-locking for cache_lock. I'll test the performance with libmemcached's memslap and let you all know how it turned out... hopefully in not so distant future. Cheers, Toru
