Hello,

Thanks for your bug report.

On 07/09/21(Tue) 15:18, M Smith wrote:
> > Synopsis:   OpenBSD amd64 6.9 repeatable kernel panic starting X
> > Category:   kernel
> > Environment:
> 
>       System      : OpenBSD 6.9
>       Details     : OpenBSD 6.9 (GENERIC.MP) #4: Tue Aug 10 08:12:23 MDT 2021
>                       
> r...@syspatch-69-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>       Architecture: OpenBSD.amd64
>       Machine     : amd64
> 
> > Description:
> 
>               I have been investigating a largely repeatable OpenBSD 6.9 
> amd64 panic.  Essentially the OS drops into the kernel debugger about 90% of 
> the time when starting X on specific hardware, and is doing so with what 
> seems like a memory related issue - possibly errant modification by 
> concurrent threads.

Indeed.  You're certainly hitting a VM/pmap bug.  

>       The event is reproducible across two independent machines (both new).  
> Each machine has identical underlying hardware.  A memory checker run 
> overnight on one machine did not identify any underlying memory issues.

That points to something in your setup which exposes the bug.

>       The hardware: Avalue EMS-TGL-S85-A1-1R, CPU an 11th Gen Intel(R) 
> Core(TM) i7-1185G7E @ 2.80GHz with 2x 16GB memory boards (32GB in total).
> 
>       The mentioned possible errant memory modification, the assertion 
> underlying this panic 
> (https://www.sirranet.co.nz/openbsd_542456/69_panic.html) suggests that 
> kernel execution has failed to obtain a necessary exclusivity lock.  Various 
> other panics differ in that many feature assertions based on "pool_do_get ... 
> offset ???" with the offset identifying the trigger condition, hinting at a 
> memory inconsistency.
> 
>       Testing on 7.0-current 
> (https://www.sirranet.co.nz/openbsd_542456/70_panic.html) sometimes results 
> in a panic on boot before invoking startX, other times the boot fails to 
> complete cleanly at the kernel linking step with the error "reodering 
> libraries ld in calloc(): chunk infor corrupted" and simular errors.  Whether 
> these two events are related to the 6.9 panic is anything but conclusive.
> 
>       I see others have posted what looks like the same issue.  I have posted 
> the above detail however as the assert identifying the lack of kernel lock 
> looks as though it may be of some value.
>       https://marc.info/?t=161769314800002&r=1&w=2
>       https://marc.info/?t=162390602600001&r=1&w=2

All those report have in common a 1th Gen Intel CPU.  

>       Any ideas would be greatly appreciated.

You could start by booting bsd.sp to rule out any HW problem.

Does the corruption happen with a vanilla install or does running
particular program makes it easier to happen?
 
>       I can easily test/re-test on both 6.9 and 7.0-current).

Does it also happen if you disable drm at boot?

Reply via email to