On 13/09/21(Mon) 08:25, M Smith wrote:
> On 8/09/21 3:37 am, Martin Pieuchot wrote:
> > Hello,
> > 
> > Thanks for your bug report.
> > 
> > On 07/09/21(Tue) 15:18, M Smith wrote:
> > > > Synopsis:       OpenBSD amd64 6.9 repeatable kernel panic starting X
> > > > Category:       kernel
> > > > Environment:
> > > 
> > >   System      : OpenBSD 6.9
> > >   Details     : OpenBSD 6.9 (GENERIC.MP) #4: Tue Aug 10 08:12:23 MDT 2021
> > >                   
> > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > >   Architecture: OpenBSD.amd64
> > >   Machine     : amd64
> > > 
> > > > Description:
> > > 
> > >           I have been investigating a largely repeatable OpenBSD 6.9 
> > > amd64 panic.  Essentially the OS drops into the kernel debugger about 90% 
> > > of the time when starting X on specific hardware, and is doing so with 
> > > what seems like a memory related issue - possibly errant modification by 
> > > concurrent threads.
> > 
> > Indeed.  You're certainly hitting a VM/pmap bug.
> > 
> > >   The event is reproducible across two independent machines (both new).  
> > > Each machine has identical underlying hardware.  A memory checker run 
> > > overnight on one machine did not identify any underlying memory issues.
> > 
> > That points to something in your setup which exposes the bug.
> > 
> > >   The hardware: Avalue EMS-TGL-S85-A1-1R, CPU an 11th Gen Intel(R) 
> > > Core(TM) i7-1185G7E @ 2.80GHz with 2x 16GB memory boards (32GB in total).
> > > 
> > >   The mentioned possible errant memory modification, the assertion 
> > > underlying this panic 
> > > (https://www.sirranet.co.nz/openbsd_542456/69_panic.html) suggests that 
> > > kernel execution has failed to obtain a necessary exclusivity lock.  
> > > Various other panics differ in that many feature assertions based on 
> > > "pool_do_get ... offset ???" with the offset identifying the trigger 
> > > condition, hinting at a memory inconsistency.
> > > 
> > >   Testing on 7.0-current 
> > > (https://www.sirranet.co.nz/openbsd_542456/70_panic.html) sometimes 
> > > results in a panic on boot before invoking startX, other times the boot 
> > > fails to complete cleanly at the kernel linking step with the error 
> > > "reodering libraries ld in calloc(): chunk infor corrupted" and simular 
> > > errors.  Whether these two events are related to the 6.9 panic is 
> > > anything but conclusive.
> > > 
> > >   I see others have posted what looks like the same issue.  I have posted 
> > > the above detail however as the assert identifying the lack of kernel 
> > > lock looks as though it may be of some value.
> > >   https://marc.info/?t=161769314800002&r=1&w=2
> > >   https://marc.info/?t=162390602600001&r=1&w=2
> > 
> > All those report have in common a 1th Gen Intel CPU.
> > 
> > >   Any ideas would be greatly appreciated.
> > 
> > You could start by booting bsd.sp to rule out any HW problem.
> 
> Sorry for the delay in replying.
> 
> Both 6.9 and 7.0 crash when booting bsd.sp
> https://www.sirranet.co.nz/openbsd_542456/69_reply.html
> https://www.sirranet.co.nz/openbsd_542456/70_reply.html

That rules out any concurrency issue.

> > Does the corruption happen with a vanilla install or does running
> > particular program makes it easier to happen?
> 
> These are both basic installs. After a fresh install I have run fw_update,
> and on the 6.9 machine syspatch was run. Other than that we have enabled
> xenodm. No other software or packages are installed or running. The machines
> don't always crash on first boot, but after a handful of reboot they do.
> 
> > >   I can easily test/re-test on both 6.9 and 7.0-current).
> > 
> > Does it also happen if you disable drm at boot?
> > 
> 
> On both 6.9 and 7.0  if I disable drm the machine panics on reboot. (Images
> in the links above.)

Please make sure you also disable inteldrm(4).  That's why you're
getting a panic on 6.9.  This is to see if the issue is related to
the graphic driver.

Reply via email to