To accelerate the process, I've increased the number of threads from 4
to 8 on a 16-core machine. Ran strace for about 5s on each.

Single process (8 threads):

$ strace -cp 16807
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 71.36    0.330172          21     15479           epoll_wait
 13.59    0.062861           4     14477         1 write
 10.58    0.048955           4     11518        10 recvfrom
  4.44    0.020544          38       537       244 read
  0.02    0.000094           1       135           sendto
  0.01    0.000051           3        16           epoll_ctl
  0.00    0.000000           0        24           close
  0.00    0.000000           0         9           socket
  0.00    0.000000           0         9         9 connect
  0.00    0.000000           0        25           sendmsg
  0.00    0.000000           0         6           shutdown
  0.00    0.000000           0        24           setsockopt
  0.00    0.000000           0         3           getsockopt
  0.00    0.000000           0         9           fcntl
  0.00    0.000000           0         8         7 accept4
------ ----------- ----------- --------- --------- ----------------
100.00    0.462677                 42279       271 total


Two processes (2x8 threads):

## Draining process

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 48.65    0.544758          30     18359           epoll_wait
 28.69    0.321283          14     23540           write
 22.60    0.253049          19     13338           recvfrom
  0.04    0.000474           1       786       374 read
  0.03    0.000287           2       187           sendto
  0.00    0.000000           0         2           close
  0.00    0.000000           0         1           sendmsg
  0.00    0.000000           0         1           shutdown
  0.00    0.000000           0         1           setsockopt
------ ----------- ----------- --------- --------- ----------------
100.00    1.119851                 56215       374 total

## "New" process

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 93.87    1.588239          11    149253           epoll_wait
  3.84    0.064985          30      2140        31 recvfrom
  1.77    0.029905          13      2388           write
  0.34    0.005737          10       589       130 read
  0.12    0.002018          38        53           close
  0.06    0.000960           8       114         2 sendto
  0.00    0.000031           1        25           shutdown
  0.00    0.000019           0       102           sendmsg
  0.00    0.000019           0        58           epoll_ctl
  0.00    0.000015           0        31           fcntl
  0.00    0.000000           0        31           socket
  0.00    0.000000           0        31        31 connect
  0.00    0.000000           0        94           setsockopt
  0.00    0.000000           0         8           getsockopt
  0.00    0.000000           0        47        29 accept4
------ ----------- ----------- --------- --------- ----------------
100.00    1.691928                154964       223 total


It does indeed appear the new process is contending with the old, even
with just 16 threads on a 16-core box. A third process (oversubscribed
to 24 threads on 16 cores):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.21    0.950863          14     69926           epoll_wait
  2.32    0.022677         208       109           write
  0.42    0.004106          48        85        14 recvfrom
  0.04    0.000439          17        26           close
  0.00    0.000022           1        34           epoll_ctl
  0.00    0.000000           0       136        33 read
  0.00    0.000000           0         1           brk
  0.00    0.000000           0        14           socket
  0.00    0.000000           0        14        14 connect
  0.00    0.000000           0        15         1 sendto
  0.00    0.000000           0        11           sendmsg
  0.00    0.000000           0        13         1 shutdown
  0.00    0.000000           0        50           setsockopt
  0.00    0.000000           0         3           getsockopt
  0.00    0.000000           0        14           fcntl
  0.00    0.000000           0        34        22 accept4
------ ----------- ----------- --------- --------- ----------------
100.00    0.978107                 70485        85 total


During this time, each of the three processes was running at roughly
250-350% CPU.


On 1/12/18 9:34 AM, Willy Tarreau wrote:
> On Fri, Jan 12, 2018 at 09:28:54AM -0600, Samuel Reed wrote:
>> Thanks for your quick answer, Willy.
>>
>> That's a shame to hear but makes sense. We'll try out some ideas for
>> reducing contention. We don't use cpu-map with nbthread; I considered it
>> best to let the kernel take care of this, especially since there are
>> some other processes on that box.
> So that definitely explains why 5 instances start to give you a high load
> with 4 threads on 16 cores. Note, do you happen to see some processes
> running at 100% CPU (or in fact 400% since you have 4 threads) ? It would
> be possible that some remaining bugs would cause older processes and their
> threads to spin a bit too much.
>
> If you're interested, when this happens you could run "strace -cp $pid"
> for a few seconds, it will report the syscall count over that period. A
> typical rule of thumb is that if you see more epoll_wait() than recvfrom()
> or read(), there's an issue somewhere in the code.
>
>> I don't really want to fall back to
>> nbproc but we may have to, at least until we get the number of reloads down.
> It's possible, but let's see if there's a way to improve the situation a
> bit by gathering some elements first ;-)
>
> Willy


Reply via email to