> Date: Tue, 9 Jan 2018 12:32:49 +1100
> From: Jonathan Gray <[email protected]>
>
> On Mon, Jan 08, 2018 at 05:20:39PM -0800, Mike Larkin wrote:
> > On Tue, Jan 09, 2018 at 12:44:04AM +0100, azarus wrote:
> > > To: [email protected]
> > > Subject: Kernel panics after some hours of use (likely related to modeset)
> > > From: [email protected]
> > > Cc: [email protected]
> > > Reply-To: [email protected]
> > >
> > > >Synopsis: The kernel panics reproducibly after a couple of hours
> > > >of use (2-4 hours)
> > > >Category: system amd64 kernel
> > > >Environment:
> > > System : OpenBSD 6.2
> > > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan 7
> > > 09:13:00 MST 2018
> > >
> > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > >
> > > Architecture: OpenBSD.amd64
> > > Machine : amd64
> > > >Description:
> > > In snapshots #320-#333 (every second snapshot or so tested) the kernel
> > > hangs reproducibly after some hours of use. During use I have a pdf
> > > viewer (mupdf), a browser (Firefox), tmux, an editor (nvim), a music
> > > player (mpd) and some shells open (zsh).
> > >
> > > This issue happens often when I leave the computer for some minutes, so
> > > it might be something related to the screen turning off (modeset).
> > >
> > > This might not be relevant, but I tried both with softdep enabled and
> > > disabled, to the same result.
> > >
> > > The machine is a ThinkPad X230, with Coreboot. (But I doubt it's
> > > coreboot causing the issue, as the computer's not going to sleep)
> > >
> > > I cannot provide a dmesg of the crashed system, as "boot dump" fails.
> > >
> > > For the complete kernel error message, trace output, show registers
> > > ouput and ps output, please regard attached pictures.
> > >
> > > >How-To-Repeat:
> > > 1. Use machine for a couple of hours
> > > 2. Leave machine for some time (5-15 minutes)
> > > 3. Kernel panics with "uvm_fault(0xfffffff81b4b158, 0x0, 0, 1) -> e"
> > > >Fix:
> > > unknown
> > >
> >
> > A few of us have been seeing this, so we know about the issue. There is
> > no fix at this time however. Thanks for reporting it though.
>
> This is the workaround I have in my tree to avoid the NULL deref.
Sorry for ignoring this until now. I never found the time to actually
look into this. Now that I have re-familliarized myself with the
code, I think the fix is right. If somebody already locked the lock
without a context, we can't establish whether we are the 'older'
process or not. So returning -EDEADLK would indeed be correct. And
it looks as if the kms locking code is prepared to handle that case.
One request; could you changes "lock->ctx != NULL" with simply "lock->ctx"?
ok kettenis@
> Index: sys/dev/pci/drm/linux_ww_mutex.h
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/drm/linux_ww_mutex.h,v
> retrieving revision 1.1
> diff -u -p -r1.1 linux_ww_mutex.h
> --- sys/dev/pci/drm/linux_ww_mutex.h 1 Jul 2017 16:14:10 -0000 1.1
> +++ sys/dev/pci/drm/linux_ww_mutex.h 13 Aug 2017 06:40:35 -0000
> @@ -163,7 +163,8 @@ __ww_mutex_lock(struct ww_mutex *lock, s
> * the `younger` process gives up all it's
> * resources.
> */
> - if (slow || ctx == NULL || ctx->stamp <
> lock->ctx->stamp) {
> + if (slow || ctx == NULL ||
> + (lock->ctx != NULL && ctx->stamp <
> lock->ctx->stamp)) {
> int s = msleep(lock, &lock->lock,
> intr ? PCATCH : 0,
> ctx ? ctx->ww_class->name :
> "ww_mutex_lock", 0);
>
>