On Fri, Jan 12, 2018 at 09:28:54AM -0600, Samuel Reed wrote: > Thanks for your quick answer, Willy. > > That's a shame to hear but makes sense. We'll try out some ideas for > reducing contention. We don't use cpu-map with nbthread; I considered it > best to let the kernel take care of this, especially since there are > some other processes on that box.
So that definitely explains why 5 instances start to give you a high load with 4 threads on 16 cores. Note, do you happen to see some processes running at 100% CPU (or in fact 400% since you have 4 threads) ? It would be possible that some remaining bugs would cause older processes and their threads to spin a bit too much. If you're interested, when this happens you could run "strace -cp $pid" for a few seconds, it will report the syscall count over that period. A typical rule of thumb is that if you see more epoll_wait() than recvfrom() or read(), there's an issue somewhere in the code. > I don't really want to fall back to > nbproc but we may have to, at least until we get the number of reloads down. It's possible, but let's see if there's a way to improve the situation a bit by gathering some elements first ;-) Willy