On Fri, Jan 12, 2018 at 09:28:54AM -0600, Samuel Reed wrote:
> Thanks for your quick answer, Willy.
> 
> That's a shame to hear but makes sense. We'll try out some ideas for
> reducing contention. We don't use cpu-map with nbthread; I considered it
> best to let the kernel take care of this, especially since there are
> some other processes on that box.

So that definitely explains why 5 instances start to give you a high load
with 4 threads on 16 cores. Note, do you happen to see some processes
running at 100% CPU (or in fact 400% since you have 4 threads) ? It would
be possible that some remaining bugs would cause older processes and their
threads to spin a bit too much.

If you're interested, when this happens you could run "strace -cp $pid"
for a few seconds, it will report the syscall count over that period. A
typical rule of thumb is that if you see more epoll_wait() than recvfrom()
or read(), there's an issue somewhere in the code.

> I don't really want to fall back to
> nbproc but we may have to, at least until we get the number of reloads down.

It's possible, but let's see if there's a way to improve the situation a
bit by gathering some elements first ;-)

Willy

Reply via email to