I see, you are right it makes more sense to be a client side problem in this case. I write a small test program and if the problem still exists I will get back to you.
Thanks for your help, Saman On Mon, Feb 9, 2015 at 4:39 PM, dormando <dorma...@rydia.net> wrote: > If mc-crusher engages all threads I'd suspect a memslap problem. that > util's never worked very well. > > I've been validating releases on dual socket 8 core (16 total+ht) machines > and have gotten all of the threads to engage just fine. > > you could also write a small test program in a language of your choice > which connects, runs a set and a get, then disconnects a few hundred > times. Then print the stats structures to see if they were all being > engaged. > > On Mon, 9 Feb 2015, Saman Barghi wrote: > > > Btw, I already considered avoiding hyper threads and running all server > threads on the same socket (to avoid cross socket latencies and better > > cache usage), but it does not seem to be a hardware related problem. > > > > On Mon, Feb 9, 2015 at 4:29 PM, Saman Barghi <sama...@gmail.com> wrote: > > Thanks for you response, find my response below: > > > > On Mon, Feb 9, 2015 at 1:39 PM, dormando <dorma...@rydia.net> > wrote: > > > I am running some tests using memached 1.4.22 over an > Intel Xeon E5 (4 sockets with 8 core each, 2 Hyper threads per > > core, and 4 NUMA nodes) and > > > running Ubuntu trusty. I compiled memcached with gcc-4.8.2 > with default CFLAGS and configuration options. > > > > > > The problem is whenever I start memcached with odd number > of server threads (3,5,7,9,11,..) everything is ok, and all > > threads are engaging in > > > processing requests, the status of all threads are > "Running". However, if I start the server with even number of threads > > (2,4,6,8,..), half of the > > > threads are always in sleep mode and do not engage in > servicing clients. This is related to memached, as memaslap, for > > example, is running with no > > > such pattern. I ran the exact test on an AMD Opteron and > things are ok with memached. So my question is: is there any > > specific tuning required for > > > Intel machines? Is there any specific flag or some part of > the code that might cause worker threads to not engage? > > > > > > > > > Thanks, > > > Saman > > > > That is pretty weird. I've not run it on a quad socket but > plenty of intel > > machines without problem. Modern ones too. > > > > > > I see, I am not sure why it happens cause everything is very straight > forward with memcached. > > > > > > How many clients are you telling memslap to use? Can you try > > https://github.com/dormando/mc-crusher quickly? (run > loadconf/similar to > > load some values, then a different one to hammer it). > > > > > > I fire memaslap with the same number of threads as memcached, and with > concurrency 20 per thread, so enough to keep server threads busy. > > > > It seems that using mc-crusher, all threads are engaged when loading > and with mget_test. So does it mean there is something fishy with > > memaslap? > > > > > > Connections are dispersed via thread.c:dispatch_conn_new() > > > > int tid = (last_thread + 1) % settings.num_threads; > > > > LIBEVENT_THREAD *thread = threads + tid; > > > > last_thread = tid; > > > > which is pretty simple at the base. > > > > > > Right, I printed out 'last_thread' to make sure nothing funny is > happening, but it's perfect round robin. > > > > > > If you can gdb up can you dump the per-thread stats structures? > that will > > show definitively if those threads ever get work or not. > > > > > > > > I ran memcached with -t 8 and client side is > > > > memaslap -s localhost -T 8 -c 160 -t 1m > > > > and below find the gdb output for per thread stats structure. It seems > that every other thread is not doing anything!? Also I can confirm > > that all memaspal threads are consuming 100% of the cores they are > running on, and again with odd number of server threads this does not > > happen. You can see the kind of results I get when running memcached and > increase the number of threads on that machine. It is not consistent > > at all! > > > > Thanks, > > Saman > > > > > > (gdb) print threads[0].stats > > $21 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers > = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, > > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, > get_cmds = 0, get_misses = 0, > > touch_cmds = 0, touch_misses = 0, delete_misses = 0, incr_misses = 0, > decr_misses = 0, cas_misses = 0, bytes_read = 0, bytes_written = 0, > > flush_cmds = 0, conn_yields = 0, auth_cmds = 0, auth_errors = 0, > slab_stats = {{set_cmds = 0, > > get_hits = 0, touch_hits = 0, delete_hits = 0, cas_hits = 0, > cas_badval = 0, incr_hits = 0, decr_hits = 0} <repeats 201 times>}} > > (gdb) > printthreads[1].stats > ?? > ? > > $22 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers > = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, > > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, > get_cmds = 4466788, > > get_misses = 3863038, touch_cmds = 0, touch_misses = 0, delete_misses > = 0, incr_misses = 0, decr_misses = 0, cas_misses = 0, bytes_read = > > 861116495, bytes_written = 693448306, flush_cmds = 0, conn_yields = 0, > auth_cmds = 0, > > auth_errors = 0, slab_stats = {{set_cmds = 0, get_hits = 0, touch_hits > = 0, delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, > > decr_hits = 0} <repeats 12 times>, {set_cmds = 496327, get_hits = > 603750, touch_hits = 0, > > delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, > decr_hits = 0}, {set_cmds = 0, get_hits = 0, touch_hits = 0, delete_hits > > = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, decr_hits = 0} > <repeats 188 times>}} > > (gdb) > printthreads[2].stats > ?? > ? > > $23 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers > = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, > > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, > get_cmds = 0, get_misses = 0, > > touch_cmds = 0, touch_misses = 0, delete_misses = 0, incr_misses = 0, > decr_misses = 0, cas_misses = 0, bytes_read = 0, bytes_written = 0, > > flush_cmds = 0, conn_yields = 0, auth_cmds = 0, auth_errors = 0, > slab_stats = {{set_cmds = 0, > > get_hits = 0, touch_hits = 0, delete_hits = 0, cas_hits = 0, > cas_badval = 0, incr_hits = 0, decr_hits = 0} <repeats 201 times>}} > > (gdb) > printthreads[3].stats > ?? > ? > > $24 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers > = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, > > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, > get_cmds = 4120462, > > get_misses = 3550471, touch_cmds = 0, touch_misses = 0, delete_misses > = 0, incr_misses = 0, decr_misses = 0, cas_misses = 0, bytes_read = > > 794355485, bytes_written = 654105157, flush_cmds = 0, conn_yields = 0, > auth_cmds = 0, > > auth_errors = 0, slab_stats = {{set_cmds = 0, get_hits = 0, touch_hits > = 0, delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, > > decr_hits = 0} <repeats 12 times>, {set_cmds = 457849, get_hits = > 569991, touch_hits = 0, > > delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, > decr_hits = 0}, {set_cmds = 0, get_hits = 0, touch_hits = 0, delete_hits > > = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, decr_hits = 0} > <repeats 188 times>}} > > (gdb) print threads[4].stats > > $25 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers > = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, > > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, > get_cmds = 0, get_misses = 0, > > touch_cmds = 0, touch_misses = 0, delete_misses = 0, incr_misses = 0, > decr_misses = 0, cas_misses = 0, bytes_read = 0, bytes_written = 0, > > flush_cmds = 0, conn_yields = 0, auth_cmds = 0, auth_errors = 0, > slab_stats = {{set_cmds = 0, > > get_hits = 0, touch_hits = 0, delete_hits = 0, cas_hits = 0, > cas_badval = 0, incr_hits = 0, decr_hits = 0} <repeats 201 times>}} > > (gdb) print threads[5].stats > > $26 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers > = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, > > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, > get_cmds = 4038230, > > get_misses = 3493086, touch_cmds = 0, touch_misses = 0, delete_misses > = 0, incr_misses = 0, decr_misses = 0, cas_misses = 0, bytes_read = > > 778500650, bytes_written = 626164950, flush_cmds = 0, conn_yields = 0, > auth_cmds = 0, > > auth_errors = 0, slab_stats = {{set_cmds = 0, get_hits = 0, touch_hits > = 0, delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, > > decr_hits = 0} <repeats 12 times>, {set_cmds = 448710, get_hits = > 545144, touch_hits = 0, > > delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, > decr_hits = 0}, {set_cmds = 0, get_hits = 0, touch_hits = 0, delete_hits > > = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, decr_hits = 0} > <repeats 188 times>}} > > (gdb) print threads[6].stats > > $27 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers > = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, > > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, > get_cmds = 0, get_misses = 0, > > touch_cmds = 0, touch_misses = 0, delete_misses = 0, incr_misses = 0, > decr_misses = 0, cas_misses = 0, bytes_read = 0, bytes_written = 0, > > flush_cmds = 0, conn_yields = 0, auth_cmds = 0, auth_errors = 0, > slab_stats = {{set_cmds = 0, > > get_hits = 0, touch_hits = 0, delete_hits = 0, cas_hits = 0, > cas_badval = 0, incr_hits = 0, decr_hits = 0} <repeats 201 times>}} > > (gdb) print threads[7].stats > > $28 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers > = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, > > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, > get_cmds = 4472436, > > get_misses = 3868324, touch_cmds = 0, touch_misses = 0, delete_misses > = 0, incr_misses = 0, decr_misses = 0, cas_misses = 0, bytes_read = > > 862203585, bytes_written = 693881564, flush_cmds = 0, conn_yields = 0, > auth_cmds = 0, > > auth_errors = 0, slab_stats = {{set_cmds = 0, get_hits = 0, touch_hits > = 0, delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, > > decr_hits = 0} <repeats 12 times>, {set_cmds = 496953, get_hits = > 604112, touch_hits = 0, > > delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, > decr_hits = 0}, {set_cmds = 0, get_hits = 0, touch_hits = 0, delete_hits > > = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, decr_hits = 0} > <repeats 188 times>}} > > > > > > > > -- > > > > --- > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to memcached+unsubscr...@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > > > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.