On Fri, Jan 12, 2018 at 09:50:58AM -0600, Samuel Reed wrote: > To accelerate the process, I've increased the number of threads from 4 > to 8 on a 16-core machine. Ran strace for about 5s on each. > > Single process (8 threads): > > $ strace -cp 16807 > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 71.36 0.330172 21 15479 epoll_wait > 13.59 0.062861 4 14477 1 write > 10.58 0.048955 4 11518 10 recvfrom > 4.44 0.020544 38 537 244 read
This one is OK and shows that quite some time is in fact spent waiting for I/O events. > Two processes (2x8 threads): > > ## Draining process > > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 48.65 0.544758 30 18359 epoll_wait > 28.69 0.321283 14 23540 write > 22.60 0.253049 19 13338 recvfrom > 0.04 0.000474 1 786 374 read > 0.03 0.000287 2 187 sendto This one as well. > ## "New" process > > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 93.87 1.588239 11 149253 epoll_wait > 3.84 0.064985 30 2140 31 recvfrom > 1.77 0.029905 13 2388 write > 0.34 0.005737 10 589 130 read > 0.12 0.002018 38 53 close > 0.06 0.000960 8 114 2 sendto This one is very interesting! So the epoll_wait to other syscalls ratio went from roughly 1/2 to 30/1. I'm pretty sure that a regular strace would show you a large number of epoll_wait(0)=0 indicating we're missing some events. I seem to remember that sometimes there are situations where a thread may be notified by epoll() about an fd it cannot take care of but I don't remember in which case, I'll have to discuss with Christopher. But at least now we have an explanation and it's not directly related to thread contention but more likely with the mapping of FDs to threads, so we may have opportunities to improve the situation here. Thanks! Willy