Re: Kernel panics after some hours of use (likely related to modeset)

Mark Kettenis Tue, 09 Jan 2018 06:14:53 -0800

> Date: Tue, 9 Jan 2018 12:32:49 +1100
> From: Jonathan Gray <[email protected]>
> 
> On Mon, Jan 08, 2018 at 05:20:39PM -0800, Mike Larkin wrote:
> > On Tue, Jan 09, 2018 at 12:44:04AM +0100, azarus wrote:
> > > To: [email protected]
> > > Subject: Kernel panics after some hours of use (likely related to modeset)
> > > From: [email protected]
> > > Cc: [email protected]
> > > Reply-To: [email protected]
> > > 
> > > >Synopsis:        The kernel panics reproducibly after a couple of hours 
> > > >of use (2-4 hours)
> > > >Category:        system amd64 kernel
> > > >Environment:
> > >   System      : OpenBSD 6.2
> > >   Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 
> > > 09:13:00 MST 2018
> > >                    
> > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > >   Architecture: OpenBSD.amd64
> > >   Machine     : amd64
> > > >Description:
> > > In snapshots #320-#333 (every second snapshot or so tested) the kernel
> > > hangs reproducibly after some hours of use. During use I have a pdf
> > > viewer (mupdf), a browser (Firefox), tmux, an editor (nvim), a music
> > > player (mpd) and some shells open (zsh).
> > > 
> > > This issue happens often when I leave the computer for some minutes, so
> > > it might be something related to the screen turning off (modeset).
> > > 
> > > This might not be relevant, but I tried both with softdep enabled and
> > > disabled, to the same result.
> > > 
> > > The machine is a ThinkPad X230, with Coreboot. (But I doubt it's
> > > coreboot causing the issue, as the computer's not going to sleep)
> > > 
> > > I cannot provide a dmesg of the crashed system, as "boot dump" fails.
> > > 
> > > For the complete kernel error message, trace output, show registers
> > > ouput and ps output, please regard attached pictures.
> > > 
> > > >How-To-Repeat:
> > >     1. Use machine for a couple of hours
> > >     2. Leave machine for some time (5-15 minutes)
> > >     3. Kernel panics with "uvm_fault(0xfffffff81b4b158, 0x0, 0, 1) -> e"
> > > >Fix:
> > > unknown
> > > 
> > 
> > A few of us have been seeing this, so we know about the issue. There is
> > no fix at this time however. Thanks for reporting it though.
> 
> This is the workaround I have in my tree to avoid the NULL deref.


Sorry for ignoring this until now.  I never found the time to actually
look into this.  Now that I have re-familliarized myself with the
code, I think the fix is right.  If somebody already locked the lock
without a context, we can't establish whether we are the 'older'
process or not.  So returning -EDEADLK would indeed be correct.  And
it looks as if the kms locking code is prepared to handle that case.

One request;  could you changes "lock->ctx != NULL" with simply "lock->ctx"?

ok kettenis@

> Index: sys/dev/pci/drm/linux_ww_mutex.h
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/drm/linux_ww_mutex.h,v
> retrieving revision 1.1
> diff -u -p -r1.1 linux_ww_mutex.h
> --- sys/dev/pci/drm/linux_ww_mutex.h  1 Jul 2017 16:14:10 -0000       1.1
> +++ sys/dev/pci/drm/linux_ww_mutex.h  13 Aug 2017 06:40:35 -0000
> @@ -163,7 +163,8 @@ __ww_mutex_lock(struct ww_mutex *lock, s
>                           *   the `younger` process gives up all it's
>                           *   resources.
>                        */
> -                     if (slow || ctx == NULL || ctx->stamp < 
> lock->ctx->stamp) {
> +                     if (slow || ctx == NULL ||
> +                         (lock->ctx != NULL && ctx->stamp < 
> lock->ctx->stamp)) {
>                               int s = msleep(lock, &lock->lock,
>                                              intr ? PCATCH : 0,
>                                              ctx ? ctx->ww_class->name : 
> "ww_mutex_lock", 0);
> 
>

Re: Kernel panics after some hours of use (likely related to modeset)

Reply via email to