On 13/09/21(Mon) 08:25, M Smith wrote: > On 8/09/21 3:37 am, Martin Pieuchot wrote: > > Hello, > > > > Thanks for your bug report. > > > > On 07/09/21(Tue) 15:18, M Smith wrote: > > > > Synopsis: OpenBSD amd64 6.9 repeatable kernel panic starting X > > > > Category: kernel > > > > Environment: > > > > > > System : OpenBSD 6.9 > > > Details : OpenBSD 6.9 (GENERIC.MP) #4: Tue Aug 10 08:12:23 MDT 2021 > > > > > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > Architecture: OpenBSD.amd64 > > > Machine : amd64 > > > > > > > Description: > > > > > > I have been investigating a largely repeatable OpenBSD 6.9 > > > amd64 panic. Essentially the OS drops into the kernel debugger about 90% > > > of the time when starting X on specific hardware, and is doing so with > > > what seems like a memory related issue - possibly errant modification by > > > concurrent threads. > > > > Indeed. You're certainly hitting a VM/pmap bug. > > > > > The event is reproducible across two independent machines (both new). > > > Each machine has identical underlying hardware. A memory checker run > > > overnight on one machine did not identify any underlying memory issues. > > > > That points to something in your setup which exposes the bug. > > > > > The hardware: Avalue EMS-TGL-S85-A1-1R, CPU an 11th Gen Intel(R) > > > Core(TM) i7-1185G7E @ 2.80GHz with 2x 16GB memory boards (32GB in total). > > > > > > The mentioned possible errant memory modification, the assertion > > > underlying this panic > > > (https://www.sirranet.co.nz/openbsd_542456/69_panic.html) suggests that > > > kernel execution has failed to obtain a necessary exclusivity lock. > > > Various other panics differ in that many feature assertions based on > > > "pool_do_get ... offset ???" with the offset identifying the trigger > > > condition, hinting at a memory inconsistency. > > > > > > Testing on 7.0-current > > > (https://www.sirranet.co.nz/openbsd_542456/70_panic.html) sometimes > > > results in a panic on boot before invoking startX, other times the boot > > > fails to complete cleanly at the kernel linking step with the error > > > "reodering libraries ld in calloc(): chunk infor corrupted" and simular > > > errors. Whether these two events are related to the 6.9 panic is > > > anything but conclusive. > > > > > > I see others have posted what looks like the same issue. I have posted > > > the above detail however as the assert identifying the lack of kernel > > > lock looks as though it may be of some value. > > > https://marc.info/?t=161769314800002&r=1&w=2 > > > https://marc.info/?t=162390602600001&r=1&w=2 > > > > All those report have in common a 1th Gen Intel CPU. > > > > > Any ideas would be greatly appreciated. > > > > You could start by booting bsd.sp to rule out any HW problem. > > Sorry for the delay in replying. > > Both 6.9 and 7.0 crash when booting bsd.sp > https://www.sirranet.co.nz/openbsd_542456/69_reply.html > https://www.sirranet.co.nz/openbsd_542456/70_reply.html
That rules out any concurrency issue. > > Does the corruption happen with a vanilla install or does running > > particular program makes it easier to happen? > > These are both basic installs. After a fresh install I have run fw_update, > and on the 6.9 machine syspatch was run. Other than that we have enabled > xenodm. No other software or packages are installed or running. The machines > don't always crash on first boot, but after a handful of reboot they do. > > > > I can easily test/re-test on both 6.9 and 7.0-current). > > > > Does it also happen if you disable drm at boot? > > > > On both 6.9 and 7.0 if I disable drm the machine panics on reboot. (Images > in the links above.) Please make sure you also disable inteldrm(4). That's why you're getting a panic on 6.9. This is to see if the issue is related to the graphic driver.
