Hi, > On Nov 22, 2017, at 1:34 AM, Jorgen Lundman <[email protected] > <mailto:[email protected]>> wrote:
… > Then from time to time, it goes crazy, loads goes over 50, nfsd threads > drop to about 120. All NFS clients spew messages regarding NR_BAD_SEQID and > NFS4ERR_STALE. Jorgen, could you decrease number of nfs server threads to 256 and check behaviour again ? > Sometimes it recovers, sometimes it reboots. It has been armed with dump > now, in case it crashes again. > Please look at fmdump output. It should show stacks even if you didn't save a crash dump. Garrett, can the id_alloc() or arena exhausting lead to reboot/crash by design ? Also there are two NUMA nodes. It also can slowdown mutex handling, if processes performing mutex lock/unlock are on all nodes. Is there possibility to bind a process and allocations only to one NUMA node ? ——— Vitaliy Gusev > On 22 Nov 2017, at 19:31, Garrett D'Amore <[email protected]> wrote: > > I was going to say the same thing. I suspect that the id space is exhausted > or nearly so. The code is spending a lot of time doing cvwait apparently and > that should not happen unless the arena is exhausted. > > The load profile for 48 cores is such that if there are runnable threads the > load should sit around 48. A perfectly utilized system will have load average > == cores. > > An outstanding question might be why there are so many runnable threads but > know that many software systems try to scale worker thread counts to match > core count. (This is not always a good thing for performance but it is common > practice nevertheless.) > On Wed, Nov 22, 2017 at 7:38 AM Pavel Zakharov <[email protected] > <mailto:[email protected]>> wrote: > Hi Jorgen, > > I took a quick look at your flamegraph and at the code and what we are seeing > here looks like a lock contention rather than a memory issue. > > It seems like the problem is at id_alloc() which uses the vmem framework to > allocate unique ids. > In particular, vmem_nextfit_alloc() is the one that is responsible for your > slowness as its operation is single threaded. > I’m somewhat confused by its implementation but my hunch is that it doesn’t > scale well to 48 CPUs. > > It would be interesting to see what the vmem arena backing that space_id_t > resource looks like. > > Regards, > Pavel ------------------------------------------ illumos-discuss Archives: https://illumos.topicbox.com/groups/discuss/discussions/T1f149f6156a80f52-M51c0d29df022c901899b816a Powered by Topicbox: https://topicbox.com
