So I finally found what was wrong with memaslap. I had to do some benchmarking lately and I realized that the memaslap is creating twice the number of connections I was asking for, and not sending anything over every other connection. So if I asked for -T 2 -c 2 -n 1, instead of 4 connections I would get 8, and since memcached assigns connections to threads in a round robin manner only half of the server threads had active connections.
I started debugging a bit more and narrow downed the problem to getaddrinfo in 'ms_network_connect' in ms_conn.c is returning two addrinfo structures, then facepalm! I was using localhost:11211 as the server address and getaddrinfo would think it's multihomed and returned both "0.0.0.0" and "127.0.0.1" and thus creating two sockets for each connection. Well, I guess maybe localhost should be removed from the help in memaslap cause this can create inaccurate benchmark results if happens for someone else. Thanks, Saman On Monday, 9 February 2015 16:49:52 UTC-5, Saman Barghi wrote: > > I see, you are right it makes more sense to be a client side problem in > this case. I write a small test program and if the problem still exists I > will get back to you. > > Thanks for your help, > Saman > > > On Mon, Feb 9, 2015 at 4:39 PM, dormando <dorma...@rydia.net> wrote: > >> If mc-crusher engages all threads I'd suspect a memslap problem. that >> util's never worked very well. >> >> I've been validating releases on dual socket 8 core (16 total+ht) machines >> and have gotten all of the threads to engage just fine. >> >> you could also write a small test program in a language of your choice >> which connects, runs a set and a get, then disconnects a few hundred >> times. Then print the stats structures to see if they were all being >> engaged. >> >> On Mon, 9 Feb 2015, Saman Barghi wrote: >> >> > Btw, I already considered avoiding hyper threads and running all server >> threads on the same socket (to avoid cross socket latencies and better >> > cache usage), but it does not seem to be a hardware related problem. >> > >> > On Mon, Feb 9, 2015 at 4:29 PM, Saman Barghi <sama...@gmail.com> wrote: >> > Thanks for you response, find my response below: >> > >> > On Mon, Feb 9, 2015 at 1:39 PM, dormando <dorma...@rydia.net> >> wrote: >> > > I am running some tests using memached 1.4.22 over an >> Intel Xeon E5 (4 sockets with 8 core each, 2 Hyper threads per >> > core, and 4 NUMA nodes) and >> > > running Ubuntu trusty. I compiled memcached with >> gcc-4.8.2 with default CFLAGS and configuration options. >> > > >> > > The problem is whenever I start memcached with odd number >> of server threads (3,5,7,9,11,..) everything is ok, and all >> > threads are engaging in >> > > processing requests, the status of all threads are >> "Running". However, if I start the server with even number of threads >> > (2,4,6,8,..), half of the >> > > threads are always in sleep mode and do not engage in >> servicing clients. This is related to memached, as memaslap, for >> > example, is running with no >> > > such pattern. I ran the exact test on an AMD Opteron and >> things are ok with memached. So my question is: is there any >> > specific tuning required for >> > > Intel machines? Is there any specific flag or some part >> of the code that might cause worker threads to not engage? >> > > >> > > >> > > Thanks, >> > > Saman >> > >> > That is pretty weird. I've not run it on a quad socket but >> plenty of intel >> > machines without problem. Modern ones too. >> > >> > >> > I see, I am not sure why it happens cause everything is very straight >> forward with memcached. >> > >> > >> > How many clients are you telling memslap to use? Can you try >> > https://github.com/dormando/mc-crusher quickly? (run >> loadconf/similar to >> > load some values, then a different one to hammer it). >> > >> > >> > I fire memaslap with the same number of threads as memcached, and with >> concurrency 20 per thread, so enough to keep server threads busy. >> > >> > It seems that using mc-crusher, all threads are engaged when loading >> and with mget_test. So does it mean there is something fishy with >> > memaslap? >> > >> > >> > Connections are dispersed via thread.c:dispatch_conn_new() >> > >> > int tid = (last_thread + 1) % settings.num_threads; >> > >> > LIBEVENT_THREAD *thread = threads + tid; >> > >> > last_thread = tid; >> > >> > which is pretty simple at the base. >> > >> > >> > Right, I printed out 'last_thread' to make sure nothing funny is >> happening, but it's perfect round robin. >> > >> > >> > If you can gdb up can you dump the per-thread stats structures? >> that will >> > show definitively if those threads ever get work or not. >> > >> > >> > >> > I ran memcached with -t 8 and client side is >> > >> > memaslap -s localhost -T 8 -c 160 -t 1m >> > >> > and below find the gdb output for per thread stats structure. It seems >> that every other thread is not doing anything!? Also I can confirm >> > that all memaspal threads are consuming 100% of the cores they are >> running on, and again with odd number of server threads this does not >> > happen. You can see the kind of results I get when running memcached >> and increase the number of threads on that machine. It is not consistent >> > at all! >> > >> > Thanks, >> > Saman >> > >> > >> > (gdb) print threads[0].stats >> > $21 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, >> __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = >> 0x0, >> > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, >> get_cmds = 0, get_misses = 0, >> > touch_cmds = 0, touch_misses = 0, delete_misses = 0, incr_misses = 0, >> decr_misses = 0, cas_misses = 0, bytes_read = 0, bytes_written = 0, >> > flush_cmds = 0, conn_yields = 0, auth_cmds = 0, auth_errors = 0, >> slab_stats = {{set_cmds = 0, >> > get_hits = 0, touch_hits = 0, delete_hits = 0, cas_hits = 0, >> cas_badval = 0, incr_hits = 0, decr_hits = 0} <repeats 201 times>}} >> > (gdb) >> printthreads[1].stats >> >> ?? >> ? >> >> > $22 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, >> __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = >> 0x0, >> > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, >> get_cmds = 4466788, >> > get_misses = 3863038, touch_cmds = 0, touch_misses = 0, delete_misses >> = 0, incr_misses = 0, decr_misses = 0, cas_misses = 0, bytes_read = >> > 861116495, bytes_written = 693448306, flush_cmds = 0, conn_yields = 0, >> auth_cmds = 0, >> > auth_errors = 0, slab_stats = {{set_cmds = 0, get_hits = 0, >> touch_hits = 0, delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = >> 0, >> > decr_hits = 0} <repeats 12 times>, {set_cmds = 496327, get_hits = >> 603750, touch_hits = 0, >> > delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, >> decr_hits = 0}, {set_cmds = 0, get_hits = 0, touch_hits = 0, delete_hits >> > = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, decr_hits = 0} >> <repeats 188 times>}} >> > (gdb) >> printthreads[2].stats >> >> ?? >> ? >> >> > $23 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, >> __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = >> 0x0, >> > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, >> get_cmds = 0, get_misses = 0, >> > touch_cmds = 0, touch_misses = 0, delete_misses = 0, incr_misses = 0, >> decr_misses = 0, cas_misses = 0, bytes_read = 0, bytes_written = 0, >> > flush_cmds = 0, conn_yields = 0, auth_cmds = 0, auth_errors = 0, >> slab_stats = {{set_cmds = 0, >> > get_hits = 0, touch_hits = 0, delete_hits = 0, cas_hits = 0, >> cas_badval = 0, incr_hits = 0, decr_hits = 0} <repeats 201 times>}} >> > (gdb) >> printthreads[3].stats >> >> ?? >> ? >> >> > $24 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, >> __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = >> 0x0, >> > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, >> get_cmds = 4120462, >> > get_misses = 3550471, touch_cmds = 0, touch_misses = 0, delete_misses >> = 0, incr_misses = 0, decr_misses = 0, cas_misses = 0, bytes_read = >> > 794355485, bytes_written = 654105157, flush_cmds = 0, conn_yields = 0, >> auth_cmds = 0, >> > auth_errors = 0, slab_stats = {{set_cmds = 0, get_hits = 0, >> touch_hits = 0, delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = >> 0, >> > decr_hits = 0} <repeats 12 times>, {set_cmds = 457849, get_hits = >> 569991, touch_hits = 0, >> > delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, >> decr_hits = 0}, {set_cmds = 0, get_hits = 0, touch_hits = 0, delete_hits >> > = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, decr_hits = 0} >> <repeats 188 times>}} >> > (gdb) print threads[4].stats >> > $25 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, >> __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = >> 0x0, >> > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, >> get_cmds = 0, get_misses = 0, >> > touch_cmds = 0, touch_misses = 0, delete_misses = 0, incr_misses = 0, >> decr_misses = 0, cas_misses = 0, bytes_read = 0, bytes_written = 0, >> > flush_cmds = 0, conn_yields = 0, auth_cmds = 0, auth_errors = 0, >> slab_stats = {{set_cmds = 0, >> > get_hits = 0, touch_hits = 0, delete_hits = 0, cas_hits = 0, >> cas_badval = 0, incr_hits = 0, decr_hits = 0} <repeats 201 times>}} >> > (gdb) print threads[5].stats >> > $26 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, >> __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = >> 0x0, >> > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, >> get_cmds = 4038230, >> > get_misses = 3493086, touch_cmds = 0, touch_misses = 0, delete_misses >> = 0, incr_misses = 0, decr_misses = 0, cas_misses = 0, bytes_read = >> > 778500650, bytes_written = 626164950, flush_cmds = 0, conn_yields = 0, >> auth_cmds = 0, >> > auth_errors = 0, slab_stats = {{set_cmds = 0, get_hits = 0, >> touch_hits = 0, delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = >> 0, >> > decr_hits = 0} <repeats 12 times>, {set_cmds = 448710, get_hits = >> 545144, touch_hits = 0, >> > delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, >> decr_hits = 0}, {set_cmds = 0, get_hits = 0, touch_hits = 0, delete_hits >> > = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, decr_hits = 0} >> <repeats 188 times>}} >> > (gdb) print threads[6].stats >> > $27 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, >> __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = >> 0x0, >> > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, >> get_cmds = 0, get_misses = 0, >> > touch_cmds = 0, touch_misses = 0, delete_misses = 0, incr_misses = 0, >> decr_misses = 0, cas_misses = 0, bytes_read = 0, bytes_written = 0, >> > flush_cmds = 0, conn_yields = 0, auth_cmds = 0, auth_errors = 0, >> slab_stats = {{set_cmds = 0, >> > get_hits = 0, touch_hits = 0, delete_hits = 0, cas_hits = 0, >> cas_badval = 0, incr_hits = 0, decr_hits = 0} <repeats 201 times>}} >> > (gdb) print threads[7].stats >> > $28 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, >> __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = >> 0x0, >> > __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, >> get_cmds = 4472436, >> > get_misses = 3868324, touch_cmds = 0, touch_misses = 0, delete_misses >> = 0, incr_misses = 0, decr_misses = 0, cas_misses = 0, bytes_read = >> > 862203585, bytes_written = 693881564, flush_cmds = 0, conn_yields = 0, >> auth_cmds = 0, >> > auth_errors = 0, slab_stats = {{set_cmds = 0, get_hits = 0, >> touch_hits = 0, delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = >> 0, >> > decr_hits = 0} <repeats 12 times>, {set_cmds = 496953, get_hits = >> 604112, touch_hits = 0, >> > delete_hits = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, >> decr_hits = 0}, {set_cmds = 0, get_hits = 0, touch_hits = 0, delete_hits >> > = 0, cas_hits = 0, cas_badval = 0, incr_hits = 0, decr_hits = 0} >> <repeats 188 times>}} >> > >> > >> > >> > -- >> > >> > --- >> > You received this message because you are subscribed to the Google >> Groups "memcached" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to memcached+unsubscr...@googlegroups.com. >> > For more options, visit https://groups.google.com/d/optout. >> > >> > >> > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.