On Sun, 4 May 2025, Marcus Glocker wrote: > On Sun, May 04, 2025 at 09:36:48AM GMT, Marcus Glocker wrote: > > > On Sun, May 04, 2025 at 09:09:37AM GMT, Marcus Glocker wrote: > > > > > On Sun, May 04, 2025 at 08:28:31AM GMT, Marcus Glocker wrote: > > > > > > > On Sat, May 03, 2025 at 05:34:17PM GMT, Philip Guenther wrote: > > > > > > > > > On Sat, 3 May 2025, Marcus Glocker wrote: > > > > > > On Sat, May 03, 2025 at 10:08:15PM GMT, Marcus Glocker wrote: > > > > > > > > > > > > > On Sat, May 03, 2025 at 09:53:11PM GMT, Marcus Glocker wrote: > > > > > > > > > > > > > > > On Sat, May 03, 2025 at 02:42:09PM GMT, George Koehler wrote: > > > > > > > > > On Sat, 3 May 2025 08:02:29 +0200 > > > > > > > > > Marcus Glocker <mar...@nazgul.ch> wrote: > > > > > ... > > > > > > > > > I don't see a panic message. I guess that you entered ddb > > > > > > > > > from the > > > > > > > > > db_ktrap call at /sys/arch/amd64/amd64/trap.c:323 (below > > > > > > > > > we_re_toast: > > > > > > > > > in kerntrap), but I don't know the kind of trap. It might > > > > > > > > > help to > > > > > > > > > move the trap_print call above the db_ktrap call, then build > > > > > > > > > a kernel > > > > > > > > > (without your workaround patch) and reproduce. > > > > > > > > > > > > > > > > Yes. That's the output when I move trap_print(), and panic() > > > > > > > > (converted > > > > > > > > to an printf) before "if (db_ktrap(type, frame->tf_err, > > > > > > > > frame))": > > > > > > > > > > > > > > > > trashcan# halt -p > > > > > > > > syncing disks... done > > > > > > > > fatal trace trap in supervisor mode > > > > > > > > trap type 5 code 0 rip ffffffff8217727d cs 8 rflags 2 cr2 > > > > > > > > ffff80003c16fa38 cpl d rsp ffff80003c083430 > > > > > > > > > > > > > > "rflags 2" means that the Trap Flag (TF) is set I guess. > > > > > > > > > > Hmm? The trap flag aka PSL_T is 0x100. 0x2 is a must-be-one bit. > > > > > > > > > > Yes, trap 5 is the debugging trap, but the way to see what caused > > > > > it is to examine %dr6. Perhaps try this on top of your diff moving > > > > > up the trap_print() (but without your ignore-T_TRCTRAP-in-kernel > > > > > diff), to clear %dr6 during boot and show it if its a trace-trap: > > > > > > > > Thanks for the diff! Attached the complete diff which I've applied to > > > > test. And this is the result: > > > > > > > > trashcan# halt -p > > > > syncing disks... done > > > > fatal trace trap in supervisor mode > > > > trap type 5 code 0 rip ffffffff813cb7cd cs 8 rflags 6 cr2 > > > > ffff80003c088ec8 cpl d rsp ffff80003c17aed0 > > > > gsbase 0xffffffff829cdff0 kgsbase 0x0 > > > > dr6 ffff0ff8 > > > > Stopped at x86_bus_space_io_write_4+0x1d: leave > > > > ddb{0}> > > > > > > > > When I try to interpret the DR6 register value correctly, according to > > > > the documentation, it would mean that Bit 3 B3 (Breakpoint #3 Condition > > > > Detected) was set. Bit 11 BLD (Bus Lock Detection) would be cleared if > > > > detected, which doesn't seem to be the case here since it's set to 1. > > > > > > And that's the content of the DR7 register (Debug Control Register) > > > during the trap: > > > > > > dr7 200004c0 > > > > > > Which means that local and global breakpoints #3 are enabled (bit 6 and > > > 7). When I clear the DR7 register as well during boot, the trap doesn't > > > happen anymore during "halt -p". > > > > And the breakpoint #3 condition (bit 29:28) is set to 10b which means: > > > > +----------------------------------+ > > | Value Break on | > > +----------------------------------+ > > | 00b Instruction execution only | > > | 01b Data writes only | > > | 10b I/O reads and writes | <- > > | (only defined if CR4.DE=1) | > > | 11b Data reads and writes | > > +----------------------------------+ > > > > But then it should happen for any read/write I/O, which we should have > > plenty before the I/O write to port 0x80. So I'm not sure why it only > > happens at that point ... > > And that's the value of the CR4 register during the trap: > > cr4 1406f8 > > Perhaps the debugging extension bit (DE bit 3) is only set during the > transition to the S5 state, and that's why the trap during I/O write > only gets triggered here. > > Anyway, would it be sensible to clear the DR7 register during boot in > general to salvage machines like this one?
Yes, but not just boot: we should also reset them when coming out of S3/S4 suspend. My previous diff made an incorrect stab at settings the BLD bit in %dr6, per recommendation in the SDM. If/when someone implements bus lock detection support they can fix up this ASM to support that; no point in doing in now. The latter two chunks add cpufunc.h wrappers for reading %dr[67] to make the trap.c bit nice. If this works for your box, Marcus, I think this is ready to go it. Philip Index: amd64/acpi_wakecode.S =================================================================== RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/acpi_wakecode.S,v diff -u -p -r1.50 acpi_wakecode.S --- amd64/acpi_wakecode.S 25 Feb 2024 22:33:09 -0000 1.50 +++ amd64/acpi_wakecode.S 4 May 2025 23:07:50 -0000 @@ -324,6 +324,11 @@ _ACPI_TRMP_LABEL(.Lacpi_long_mode_trampo andb $0xF9, 5(%rax,%rcx) ltr %cx + /* Reset debug control registers */ + xorl %eax,%eax + movq %rax,%dr6 + movq %rax,%dr7 + pushq .Lacpi_saved_fl popfq Index: amd64/locore0.S =================================================================== RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/locore0.S,v diff -u -p -r1.26 locore0.S --- amd64/locore0.S 4 Oct 2024 21:15:52 -0000 1.26 +++ amd64/locore0.S 4 May 2025 23:08:02 -0000 @@ -193,7 +193,12 @@ bi_size_ok: pushl $PSL_MBO popfl - xorl %eax,%eax + /* Reset debug control registers */ + xorl %eax,%eax + movl %eax,%dr6 + movl %eax,%dr7 + + /* %eax still zero from above */ cpuid movl %eax,RELOC(cpuid_level) movl $RELOC(cpu_vendor),%ebp Index: amd64/trap.c =================================================================== RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/trap.c,v diff -u -p -r1.106 trap.c --- amd64/trap.c 4 Sep 2024 07:54:51 -0000 1.106 +++ amd64/trap.c 4 May 2025 23:15:58 -0000 @@ -465,6 +465,8 @@ trap_print(struct trapframe *frame, int frame->tf_rflags, rcr2(), curcpu()->ci_ilevel, frame->tf_rsp); printf("gsbase %p kgsbase %p\n", (void *)rdmsr(MSR_GSBASE), (void *)rdmsr(MSR_KERNELGSBASE)); + if (type == T_TRCTRAP) + printf("dr6 %llx dr7 %llx\n", rdr6(), rdr7()); } Index: include/cpufunc.h =================================================================== RCS file: /data/src/openbsd/src/sys/arch/amd64/include/cpufunc.h,v diff -u -p -r1.43 cpufunc.h --- include/cpufunc.h 8 Nov 2024 12:08:22 -0000 1.43 +++ include/cpufunc.h 4 May 2025 23:04:06 -0000 @@ -159,6 +159,25 @@ rcr4(void) return (u_int) val64; } +/* + * DR6 and DR7 debug registers + */ +static inline uint64_t +rdr6(void) +{ + u_int64_t val; + __asm volatile("movq %%dr6,%0" : "=r" (val)); + return val; +} + +static inline uint64_t +rdr7(void) +{ + u_int64_t val; + __asm volatile("movq %%dr7,%0" : "=r" (val)); + return val; +} + static __inline void tlbflush(void) {