> Date: Fri, 26 Dec 2025 10:44:49 +0100
> From: Mark Kettenis <[email protected]>
> 
> > From: K R <[email protected]>
> > Date: Thu, 25 Dec 2025 14:45:54 -0300
> > 
> > On Thu, Dec 25, 2025 at 2:11 PM Mark Kettenis <[email protected]> 
> > wrote:
> > >
> > > > From: K R <[email protected]>
> > > > Date: Wed, 24 Dec 2025 10:43:34 -0300
> > > >
> > > > On Tue, Dec 23, 2025 at 6:07 PM Miod Vallat <[email protected]> wrote:
> > > > >
> > > > > > Hi, thanks for the diff -- I've just applied it.  According to
> > > > > > KASSERT(9), DIAGNOSTIC must be enabled in the kernel.  Could you
> > > > > > confirm that?
> > > > >
> > > > > DIAGNOSTIC is enabled in GENERIC and GENERIC.MP kernels.
> > > >
> > > > You're right, of course.  I've applied your patch  -- the machine hung
> > > > building lang/rust, but I got a protection fault again.  The ddb
> > > > session is still on, in case you need symbols examined.
> > >
> > > Can you send us the dmesg for this machine?
> > 
> > Sure.
> 
> Thanks.  Can you try the diff below?

So the elaborate on what's going on here: what our TLB flushing code
is doing is (for a range flush of [sva, eva)):

    tlb_shoot_addr1 = sva;
    tlb_shoot_addr2 = eva;
    send_ipi()

where tlb_shoot_addr1 and tlb_shoot_addr2 are global variables.

The IPI handler (running on a different CPU) then does:

    sva = tlb_shoot_addr1;
    eva = tlb_shoot_addr2;
    for (va = sva; va < eva; va += PAGE_SIZE)
        invlpg(va); /* Flush TLB for page */

Now if we're using the x2APIC, send_ipi() is effectively a write to an
x2APIC MSR.  Normally a write to an MSRs is a serializing instruction,
which would guarantee that the new values for tlb_shoot_addr1 and
tlb_shoot_addr2 are visible to the other CPUs.  However, Intel
explicitly documents writes to the x2APIC as non-serializing
instructions.  That means they can execute out-of-order.  Which means
that the first pseudocode fragment could execute as:

   send_ipi();
   tlb_shoot_addr1 = sva;
   tlb_shoot_addr2 = eva;

And that means the IPI handler on the other CPU might see the *old*
values for tlb_shoot_addr1 and tlb_shoot_addr2.  Linux does an
"mfence; lfence" before the MSR write that sends an IPI.  I learned
all of this from the following stackoverflow:

  
https://stackoverflow.com/questions/76352933/will-memory-write-be-visible-after-sending-an-ipi-on-x86

There may be better sources though.

Note that on AMD CPUs the x2APIC MSR writes are serializing
instructions.  Also note that we don't use the x2APIC on OpenBSD
unless the BIOS has it enabled already at boot (or if we're running
under a hypervisor).  That is probably why we haven't seen this issue
pop up a lot.  My suspicion is that this Dell R440 has the x2APIC at
boot.

Here is an updated version of the diff that prints the x2APIC state
such that we can verify this hypothesis.


Index: arch/amd64/amd64/lapic.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
diff -u -p -r1.76 lapic.c
--- arch/amd64/amd64/lapic.c    12 Nov 2025 09:48:52 -0000      1.76
+++ arch/amd64/amd64/lapic.c    26 Dec 2025 12:40:06 -0000
@@ -235,6 +235,8 @@ lapic_map(paddr_t lapic_base)
            (uint64_t)va, (uint64_t)lapic_base);
 
        intr_restore(s);
+
+       printf("%s: x2apic_enabled %d\n", __func__, x2apic_enabled);
 }
 
 /*
@@ -691,6 +693,7 @@ x2apic_ipi(int vec, int target, int dl)
 
        lo = (target & LAPIC_DEST_MASK) | vec | dl | LAPIC_LVL_ASSERT;
 
+       __asm volatile("mfence; lfence" ::: "memory");
        x2apic_writeicr(hi, lo);
 }
 

Reply via email to