Excellent! Please let me know if there's any other output you'd like
from this machine.
Strace on that new process shows thousands of these types of syscalls,
which vary slightly,
epoll_wait(3, {{EPOLLIN, {u32=206, u64=206}}}, 200, 239) = 1
and these:
epoll_wait(3, {}, 200, 0) = 0
There is also something of a cascade (each repeats about 10-20x before
the next):
epoll_wait(3, {{EPOLLIN, {u32=47, u64=47}}}, 200, 71) = 1
epoll_wait(3, {{EPOLLIN, {u32=93, u64=93}}, {EPOLLIN, {u32=656,
u64=656}}}, 200, 65) = 2
epoll_wait(3, {{EPOLLIN, {u32=93, u64=93}}, {EPOLLIN, {u32=656,
u64=656}}, {EPOLLIN, {u32=227, u64=227}}}, 200, 0) = 3
epoll_wait(3, {{EPOLLIN, {u32=93, u64=93}}, {EPOLLIN, {u32=656,
u64=656}}, {EPOLLIN, {u32=227, u64=227}}, {EPOLLIN, {u32=785,
u64=785}}}, 200, 65) = 4
epoll_wait(3, {{EPOLLIN, {u32=93, u64=93}}, {EPOLLIN, {u32=656,
u64=656}}, {EPOLLIN, {u32=227, u64=227}}, {EPOLLIN, {u32=785, u64=785}},
{EPOLLIN, {u32=639, u64=639}}}, 200, 64) = 5
I've seen it go as deep as 15. The trace is absolutely dominated by these.
On 1/12/18 10:01 AM, Willy Tarreau wrote:
> On Fri, Jan 12, 2018 at 09:50:58AM -0600, Samuel Reed wrote:
>> To accelerate the process, I've increased the number of threads from 4
>> to 8 on a 16-core machine. Ran strace for about 5s on each.
>>
>> Single process (8 threads):
>>
>> $ strace -cp 16807
>> % time seconds usecs/call calls errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>> 71.36 0.330172 21 15479 epoll_wait
>> 13.59 0.062861 4 14477 1 write
>> 10.58 0.048955 4 11518 10 recvfrom
>> 4.44 0.020544 38 537 244 read
> This one is OK and shows that quite some time is in fact spent waiting
> for I/O events.
>
>> Two processes (2x8 threads):
>>
>> ## Draining process
>>
>> % time seconds usecs/call calls errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>> 48.65 0.544758 30 18359 epoll_wait
>> 28.69 0.321283 14 23540 write
>> 22.60 0.253049 19 13338 recvfrom
>> 0.04 0.000474 1 786 374 read
>> 0.03 0.000287 2 187 sendto
> This one as well.
>
>> ## "New" process
>>
>> % time seconds usecs/call calls errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>> 93.87 1.588239 11 149253 epoll_wait
>> 3.84 0.064985 30 2140 31 recvfrom
>> 1.77 0.029905 13 2388 write
>> 0.34 0.005737 10 589 130 read
>> 0.12 0.002018 38 53 close
>> 0.06 0.000960 8 114 2 sendto
> This one is very interesting! So the epoll_wait to other syscalls ratio
> went from roughly 1/2 to 30/1. I'm pretty sure that a regular strace would
> show you a large number of epoll_wait(0)=0 indicating we're missing some
> events. I seem to remember that sometimes there are situations where a
> thread may be notified by epoll() about an fd it cannot take care of but
> I don't remember in which case, I'll have to discuss with Christopher.
>
> But at least now we have an explanation and it's not directly related to
> thread contention but more likely with the mapping of FDs to threads, so
> we may have opportunities to improve the situation here.
>
> Thanks!
> Willy