On 8/09/21 3:37 am, Martin Pieuchot wrote:
Hello,

Thanks for your bug report.

On 07/09/21(Tue) 15:18, M Smith wrote:
Synopsis:       OpenBSD amd64 6.9 repeatable kernel panic starting X
Category:       kernel
Environment:

        System      : OpenBSD 6.9
        Details     : OpenBSD 6.9 (GENERIC.MP) #4: Tue Aug 10 08:12:23 MDT 2021
                        
r...@syspatch-69-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

        Architecture: OpenBSD.amd64
        Machine     : amd64

Description:

                I have been investigating a largely repeatable OpenBSD 6.9 
amd64 panic.  Essentially the OS drops into the kernel debugger about 90% of 
the time when starting X on specific hardware, and is doing so with what seems 
like a memory related issue - possibly errant modification by concurrent 
threads.

Indeed.  You're certainly hitting a VM/pmap bug.

        The event is reproducible across two independent machines (both new).  
Each machine has identical underlying hardware.  A memory checker run overnight 
on one machine did not identify any underlying memory issues.

That points to something in your setup which exposes the bug.

        The hardware: Avalue EMS-TGL-S85-A1-1R, CPU an 11th Gen Intel(R) 
Core(TM) i7-1185G7E @ 2.80GHz with 2x 16GB memory boards (32GB in total).

        The mentioned possible errant memory modification, the assertion underlying this 
panic (https://www.sirranet.co.nz/openbsd_542456/69_panic.html) suggests that kernel 
execution has failed to obtain a necessary exclusivity lock.  Various other panics differ 
in that many feature assertions based on "pool_do_get ... offset ???" with the 
offset identifying the trigger condition, hinting at a memory inconsistency.

        Testing on 7.0-current (https://www.sirranet.co.nz/openbsd_542456/70_panic.html) 
sometimes results in a panic on boot before invoking startX, other times the boot fails 
to complete cleanly at the kernel linking step with the error "reodering libraries 
ld in calloc(): chunk infor corrupted" and simular errors.  Whether these two events 
are related to the 6.9 panic is anything but conclusive.

        I see others have posted what looks like the same issue.  I have posted 
the above detail however as the assert identifying the lack of kernel lock 
looks as though it may be of some value.
        https://marc.info/?t=161769314800002&r=1&w=2
        https://marc.info/?t=162390602600001&r=1&w=2

All those report have in common a 1th Gen Intel CPU.

        Any ideas would be greatly appreciated.

You could start by booting bsd.sp to rule out any HW problem.

Sorry for the delay in replying.

Both 6.9 and 7.0 crash when booting bsd.sp
https://www.sirranet.co.nz/openbsd_542456/69_reply.html
https://www.sirranet.co.nz/openbsd_542456/70_reply.html

Does the corruption happen with a vanilla install or does running
particular program makes it easier to happen?

These are both basic installs. After a fresh install I have run fw_update, and on the 6.9 machine syspatch was run. Other than that we have enabled xenodm. No other software or packages are installed or running. The machines don't always crash on first boot, but after a handful of reboot they do.

        I can easily test/re-test on both 6.9 and 7.0-current).

Does it also happen if you disable drm at boot?


On both 6.9 and 7.0 if I disable drm the machine panics on reboot. (Images in the links above.)


Thanks for your help.
Megan

Reply via email to