On Mon, Jan 08, 2018 at 05:20:39PM -0800, Mike Larkin wrote:
> On Tue, Jan 09, 2018 at 12:44:04AM +0100, azarus wrote:
> > To: [email protected]
> > Subject: Kernel panics after some hours of use (likely related to modeset)
> > From: [email protected]
> > Cc: [email protected]
> > Reply-To: [email protected]
> >
> > >Synopsis: The kernel panics reproducibly after a couple of hours of use
> > >(2-4 hours)
> > >Category: system amd64 kernel
> > >Environment:
> > System : OpenBSD 6.2
> > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan 7
> > 09:13:00 MST 2018
> >
> > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > In snapshots #320-#333 (every second snapshot or so tested) the kernel
> > hangs reproducibly after some hours of use. During use I have a pdf
> > viewer (mupdf), a browser (Firefox), tmux, an editor (nvim), a music
> > player (mpd) and some shells open (zsh).
> >
> > This issue happens often when I leave the computer for some minutes, so
> > it might be something related to the screen turning off (modeset).
> >
> > This might not be relevant, but I tried both with softdep enabled and
> > disabled, to the same result.
> >
> > The machine is a ThinkPad X230, with Coreboot. (But I doubt it's
> > coreboot causing the issue, as the computer's not going to sleep)
> >
> > I cannot provide a dmesg of the crashed system, as "boot dump" fails.
> >
> > For the complete kernel error message, trace output, show registers
> > ouput and ps output, please regard attached pictures.
> >
> > >How-To-Repeat:
> > 1. Use machine for a couple of hours
> > 2. Leave machine for some time (5-15 minutes)
> > 3. Kernel panics with "uvm_fault(0xfffffff81b4b158, 0x0, 0, 1) -> e"
> > >Fix:
> > unknown
> >
>
> A few of us have been seeing this, so we know about the issue. There is
> no fix at this time however. Thanks for reporting it though.
This is the workaround I have in my tree to avoid the NULL deref.
Index: sys/dev/pci/drm/linux_ww_mutex.h
===================================================================
RCS file: /cvs/src/sys/dev/pci/drm/linux_ww_mutex.h,v
retrieving revision 1.1
diff -u -p -r1.1 linux_ww_mutex.h
--- sys/dev/pci/drm/linux_ww_mutex.h 1 Jul 2017 16:14:10 -0000 1.1
+++ sys/dev/pci/drm/linux_ww_mutex.h 13 Aug 2017 06:40:35 -0000
@@ -163,7 +163,8 @@ __ww_mutex_lock(struct ww_mutex *lock, s
* the `younger` process gives up all it's
* resources.
*/
- if (slow || ctx == NULL || ctx->stamp <
lock->ctx->stamp) {
+ if (slow || ctx == NULL ||
+ (lock->ctx != NULL && ctx->stamp <
lock->ctx->stamp)) {
int s = msleep(lock, &lock->lock,
intr ? PCATCH : 0,
ctx ? ctx->ww_class->name :
"ww_mutex_lock", 0);