cpu traces

Vitaliy Makkoveev Wed, 14 Jun 2023 09:52:06 -0700

On Tue, May 30, 2023 at 01:31:08PM +0200, Martin Pieuchot wrote:
> On 25/05/23(Thu) 16:33, Kurt Miller wrote:
> > On May 22, 2023, at 2:27 AM, Claudio Jeker <clau...@openbsd.org> wrote:
> > > I have seen these WITNESS warnings on other systems as well. I doubt this
> > > is the problem. IIRC this warning is because sys_mount() is doing it wrong
> > > but it is not really an issue since sys_mount is not called often.
> > 
> > Yup. I see that now that I have tested witness on several arches. They all
> > show this lock order reversal right after booting the system. I guess this
> > means what I am seeing isn’t something that witness detects.
> > 
> > On -current with my T4-1, I can reliably reproduce the issues I am seeing.
> > While the problem is intermittent I can’t get very far into the jdk build 
> > without
> > tripping it. Instructions for reproducing the issue are:
> > 
> > Add wxallowed to /usr/local/ and /usr/ports (or wherever WRKOBJDIR has
> > been changed to)
> > 
> > doas pkg_add jdk zip unzip cups-libs bash gmake libiconv giflib
> > 
> > cd /usr/ports/devel/jdk/1.8
> > FLAVOR=native_bootstrap make
> > 
> > There are two stages to the problem. A java command (or javac or javah)
> > gets stuck making forward progress and nearly all of its cpu time is in
> > sys time category. You can see this in top as 1500-3000% CPU time on
> > the java process. ktrace of the process in this state shows endless
> > sched_yield() calls. Debugging shows many threads in
> > pthread_cond_wait(3). The condition vars are configured to use
> > CLOCK_MONOTONIC.
> > 
> > The second stage of the problem is when things lock up. While java is
> > spinning in this sched_yield() state, if you display the process arguments 
> > in
> > top (pressing the right arrow) you trip the lockups. top stops responding.
> > getty will reprompt if enter is pressed, but locks up if a username is 
> > entered.
> > Most processes lock up when doing anything after this point. ddb ps at this
> > stage shows top waiting on vmmaplk and the rest of the stuck processes
> > waiting on sysctllk (sshd, systat, login).
> 
> So it seems the java process is holding the `sysctl_lock' for too long
> and block all other sysctl(2).  This seems wrong to me.  We should come
> up with a clever way to prevent vslocking too much memory.  A single
> lock obviously doesn't fly with that many CPUs. 
>


We vslock memory to prevent context switch while doing copyin() and
copyout(), right? This is required for avoid context switch within foreach
loops of kernel lock protected lists. But this seems not be required for
simple sysctl_int() calls or rwlock protected data. So sysctl_lock
acquisition and the uvm_vslock() calls could be avoided for significant
count of mibs and pushed deep down for the rest.

Re: Sparc64 livelock/system freeze w/cpu traces

Reply via email to