On Sun, May 04, 2025 at 05:57:56PM -0700, guent...@openbsd.org wrote:

> On Sun, 4 May 2025, Marcus Glocker wrote:
> > On Sun, May 04, 2025 at 09:36:48AM GMT, Marcus Glocker wrote:
> > 
> > > On Sun, May 04, 2025 at 09:09:37AM GMT, Marcus Glocker wrote:
> > > 
> > > > On Sun, May 04, 2025 at 08:28:31AM GMT, Marcus Glocker wrote:
> > > > 
> > > > > On Sat, May 03, 2025 at 05:34:17PM GMT, Philip Guenther wrote:
> > > > > 
> > > > > > On Sat, 3 May 2025, Marcus Glocker wrote:
> > > > > > > On Sat, May 03, 2025 at 10:08:15PM GMT, Marcus Glocker wrote:
> > > > > > > 
> > > > > > > > On Sat, May 03, 2025 at 09:53:11PM GMT, Marcus Glocker wrote:
> > > > > > > > 
> > > > > > > > > On Sat, May 03, 2025 at 02:42:09PM GMT, George Koehler wrote:
> > > > > > > > > > On Sat, 3 May 2025 08:02:29 +0200
> > > > > > > > > > Marcus Glocker <mar...@nazgul.ch> wrote:
> > > > > > ...
> > > > > > > > > > I don't see a panic message.  I guess that you entered ddb 
> > > > > > > > > > from the
> > > > > > > > > > db_ktrap call at /sys/arch/amd64/amd64/trap.c:323 (below 
> > > > > > > > > > we_re_toast:
> > > > > > > > > > in kerntrap), but I don't know the kind of trap.  It might 
> > > > > > > > > > help to
> > > > > > > > > > move the trap_print call above the db_ktrap call, then 
> > > > > > > > > > build a kernel
> > > > > > > > > > (without your workaround patch) and reproduce.
> > > > > > > > > 
> > > > > > > > > Yes.  That's the output when I move trap_print(), and panic() 
> > > > > > > > > (converted
> > > > > > > > > to an printf) before "if (db_ktrap(type, frame->tf_err, 
> > > > > > > > > frame))":
> > > > > > > > > 
> > > > > > > > > trashcan# halt -p
> > > > > > > > > syncing disks... done
> > > > > > > > > fatal trace trap in supervisor mode
> > > > > > > > > trap type 5 code 0 rip ffffffff8217727d cs 8 rflags 2 cr2 
> > > > > > > > > ffff80003c16fa38 cpl d rsp ffff80003c083430
> > > > > > > > 
> > > > > > > > "rflags 2" means that the Trap Flag (TF) is set I guess.
> > > > > > 
> > > > > > Hmm?  The trap flag aka PSL_T is 0x100.  0x2 is a must-be-one bit.
> > > > > > 
> > > > > > Yes, trap 5 is the debugging trap, but the way to see what caused
> > > > > > it is to examine %dr6.  Perhaps try this on top of your diff moving
> > > > > > up the trap_print() (but without your ignore-T_TRCTRAP-in-kernel
> > > > > > diff), to clear %dr6 during boot and show it if its a trace-trap:
> > > > > 
> > > > > Thanks for the diff!  Attached the complete diff which I've applied to
> > > > > test.  And this is the result:
> > > > > 
> > > > > trashcan# halt -p
> > > > > syncing disks... done
> > > > > fatal trace trap in supervisor mode
> > > > > trap type 5 code 0 rip ffffffff813cb7cd cs 8 rflags 6 cr2 
> > > > > ffff80003c088ec8 cpl d rsp ffff80003c17aed0
> > > > > gsbase 0xffffffff829cdff0  kgsbase 0x0
> > > > > dr6 ffff0ff8
> > > > > Stopped at    x86_bus_space_io_write_4+0x1d:  leave
> > > > > ddb{0}>
> > > > > 
> > > > > When I try to interpret the DR6 register value correctly, according to
> > > > > the documentation, it would mean that Bit 3 B3 (Breakpoint #3 
> > > > > Condition
> > > > > Detected) was set.  Bit 11 BLD (Bus Lock Detection) would be cleared 
> > > > > if
> > > > > detected, which doesn't seem to be the case here since it's set to 1.
> > > > 
> > > > And that's the content of the DR7 register (Debug Control Register)
> > > > during the trap:
> > > > 
> > > > dr7 200004c0
> > > > 
> > > > Which means that local and global breakpoints #3 are enabled (bit 6 and
> > > > 7).  When I clear the DR7 register as well during boot, the trap doesn't
> > > > happen anymore during "halt -p".
> > > 
> > > And the breakpoint #3 condition (bit 29:28) is set to 10b which means:
> > > 
> > > +----------------------------------+
> > > | Value   Break on                   |
> > > +----------------------------------+
> > > | 00b     Instruction execution only |
> > > | 01b     Data writes only           |
> > > | 10b     I/O reads and writes       | <-
> > > | (only defined if CR4.DE=1) |
> > > | 11b     Data reads and writes      |
> > > +----------------------------------+
> > > 
> > > But then it should happen for any read/write I/O, which we should have
> > > plenty before the I/O write to port 0x80.  So I'm not sure why it only
> > > happens at that point ...
> > 
> > And that's the value of the CR4 register during the trap:
> > 
> > cr4 1406f8
> > 
> > Perhaps the debugging extension bit (DE bit 3) is only set during the
> > transition to the S5 state, and that's why the trap during I/O write
> > only gets triggered here.
> > 
> > Anyway, would it be sensible to clear the DR7 register during boot in
> > general to salvage machines like this one?
> 
> Yes, but not just boot: we should also reset them when coming out
> of S3/S4 suspend.
> 
> My previous diff made an incorrect stab at settings the BLD bit in
> %dr6, per recommendation in the SDM.  If/when someone implements
> bus lock detection support they can fix up this ASM to support that;
> no point in doing in now.
> 
> The latter two chunks add cpufunc.h wrappers for reading %dr[67]
> to make the trap.c bit nice.
> 
> If this works for your box, Marcus, I think this is ready to go it.

Your diff looks good to me (I did pretty much the same one for locore0.S
yesterday for testing), and works fine on the MacPro6,1.

I also did a quick zzz regression test for the acpi_wakecode.S part on
my X1 and Go4, without issues.

One very small comment inline.  Other than that, ok mglocker@

> Philip

Thanks,
Marcus

PS:
Do we also need to take care about i386?
 
> Index: amd64/acpi_wakecode.S
> ===================================================================
> RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/acpi_wakecode.S,v
> diff -u -p -r1.50 acpi_wakecode.S
> --- amd64/acpi_wakecode.S     25 Feb 2024 22:33:09 -0000      1.50
> +++ amd64/acpi_wakecode.S     4 May 2025 23:07:50 -0000
> @@ -324,6 +324,11 @@ _ACPI_TRMP_LABEL(.Lacpi_long_mode_trampo
>       andb    $0xF9, 5(%rax,%rcx)
>       ltr     %cx
>  
> +     /* Reset debug control registers */
> +     xorl    %eax,%eax
> +     movq    %rax,%dr6
> +     movq    %rax,%dr7
> +
>       pushq   .Lacpi_saved_fl
>       popfq
>  
> Index: amd64/locore0.S
> ===================================================================
> RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/locore0.S,v
> diff -u -p -r1.26 locore0.S
> --- amd64/locore0.S   4 Oct 2024 21:15:52 -0000       1.26
> +++ amd64/locore0.S   4 May 2025 23:08:02 -0000
> @@ -193,7 +193,12 @@ bi_size_ok:
>       pushl   $PSL_MBO
>       popfl
>  
> -     xorl    %eax,%eax
> +     /* Reset debug control registers */
> +     xorl    %eax,%eax
> +     movl    %eax,%dr6
> +     movl    %eax,%dr7

I think we should use tabs for spacing here as well.

> +
> +     /* %eax still zero from above */
>       cpuid
>       movl    %eax,RELOC(cpuid_level)
>       movl    $RELOC(cpu_vendor),%ebp
> Index: amd64/trap.c
> ===================================================================
> RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/trap.c,v
> diff -u -p -r1.106 trap.c
> --- amd64/trap.c      4 Sep 2024 07:54:51 -0000       1.106
> +++ amd64/trap.c      4 May 2025 23:15:58 -0000
> @@ -465,6 +465,8 @@ trap_print(struct trapframe *frame, int 
>           frame->tf_rflags, rcr2(), curcpu()->ci_ilevel, frame->tf_rsp);
>       printf("gsbase %p  kgsbase %p\n",
>           (void *)rdmsr(MSR_GSBASE), (void *)rdmsr(MSR_KERNELGSBASE));
> +     if (type == T_TRCTRAP)
> +             printf("dr6 %llx dr7 %llx\n", rdr6(), rdr7());
>  }
>  
>  
> Index: include/cpufunc.h
> ===================================================================
> RCS file: /data/src/openbsd/src/sys/arch/amd64/include/cpufunc.h,v
> diff -u -p -r1.43 cpufunc.h
> --- include/cpufunc.h 8 Nov 2024 12:08:22 -0000       1.43
> +++ include/cpufunc.h 4 May 2025 23:04:06 -0000
> @@ -159,6 +159,25 @@ rcr4(void)
>       return (u_int) val64;
>  }
>  
> +/*
> + * DR6 and DR7 debug registers
> + */
> +static inline uint64_t
> +rdr6(void)
> +{
> +     u_int64_t val;
> +     __asm volatile("movq %%dr6,%0" : "=r" (val));
> +     return val;
> +}
> +
> +static inline uint64_t
> +rdr7(void)
> +{
> +     u_int64_t val;
> +     __asm volatile("movq %%dr7,%0" : "=r" (val));
> +     return val;
> +}
> +
>  static __inline void
>  tlbflush(void)
>  {
> 

Reply via email to