Yeah. Removing contended locks gives more speedup. But noting the performance numbers from 1.4.15, going even faster than that is almost useless. It's very hard to get your network to perform up to those levels.
Though there's still room for improvement. Are you just reading the code academically, or do you have a problem you're trying to solve? On Fri, 4 Jan 2013, liubo wrote: > remove global mutex will get more speed up,right? > > > 2013/1/4 liubo <[email protected]> > For example,slabs_lock?? some global mutex. > > > 2013/1/4 dormando <[email protected]> > > Hello. > > I found all stat is protected by thread's mutex. > > All event is running in the signal thread context. > > > > Why need the protect,for sum?? or for command STAT?? > > > > thanks > > It's for when the summation happens, you can get consistent reads. > > NOTES, SINCE I HEAR THIS A LOT: > > *uncontested* mutexes aren't free, but are very nearly free. *contested* > mutexes slow things down a lot. > > Since those thread locks are only ever called in the brief times in which > you actually run stats commands, they have a very very small amount of > overhead. > > When I was doing the lock scaling patches for 1.4.10-1.4.15 I did test > this out: > > https://github.com/dormando/memcached/commit/56ad41e1a19a7fc99da51bdca4fdcb524a300984 > > (a little further work would be required to make that change permanent). > On 64bit systems you can do 64bit-aligned 8 byte memory reads atomically, > so as long as the stats structure is all 64bit items, is 64bit aligned, > and the external reader is ... just a reader, you can get pretty accurate > readings. on 32bit you need the lock. > > So I thought I'd try removing the locks on my 64bit system and test it. > There was *ALMOST NO* change in performance. I can't stress this enough. > Everyone focuses on these locks but if you bust out a God Damned Ruler > they don't even use crap for cycles. The other work I did ended up having > a much higher effect when tested, and I merged those branches instead. I > think it was between 1-5% change in speed. By comparison making the lock > shorter in the item_alloc code was a 15-30% bump. > > It'll be nice to remove the uncontested locks and save some CPU, but it > was a much lower priority than other work. > > have fun, > -Dormando > > > > > -- > -- liubo > > > > > -- > -- liubo > >
