On Fri, Dec 26, 2025 at 9:40 AM Mark Kettenis <[email protected]> wrote:
>
> > Date: Fri, 26 Dec 2025 10:44:49 +0100
> > From: Mark Kettenis <[email protected]>
> >
> > > From: K R <[email protected]>
> > > Date: Thu, 25 Dec 2025 14:45:54 -0300
> > >
> > > On Thu, Dec 25, 2025 at 2:11 PM Mark Kettenis <[email protected]>
> > > wrote:
> > > >
> > > > > From: K R <[email protected]>
> > > > > Date: Wed, 24 Dec 2025 10:43:34 -0300
> > > > >
> > > > > On Tue, Dec 23, 2025 at 6:07 PM Miod Vallat <[email protected]> wrote:
> > > > > >
> > > > > > > Hi, thanks for the diff -- I've just applied it. According to
> > > > > > > KASSERT(9), DIAGNOSTIC must be enabled in the kernel. Could you
> > > > > > > confirm that?
> > > > > >
> > > > > > DIAGNOSTIC is enabled in GENERIC and GENERIC.MP kernels.
> > > > >
> > > > > You're right, of course. I've applied your patch -- the machine hung
> > > > > building lang/rust, but I got a protection fault again. The ddb
> > > > > session is still on, in case you need symbols examined.
> > > >
> > > > Can you send us the dmesg for this machine?
> > >
> > > Sure.
> >
> > Thanks. Can you try the diff below?
>
> So the elaborate on what's going on here: what our TLB flushing code
> is doing is (for a range flush of [sva, eva)):
>
> tlb_shoot_addr1 = sva;
> tlb_shoot_addr2 = eva;
> send_ipi()
>
> where tlb_shoot_addr1 and tlb_shoot_addr2 are global variables.
>
> The IPI handler (running on a different CPU) then does:
>
> sva = tlb_shoot_addr1;
> eva = tlb_shoot_addr2;
> for (va = sva; va < eva; va += PAGE_SIZE)
> invlpg(va); /* Flush TLB for page */
>
> Now if we're using the x2APIC, send_ipi() is effectively a write to an
> x2APIC MSR. Normally a write to an MSRs is a serializing instruction,
> which would guarantee that the new values for tlb_shoot_addr1 and
> tlb_shoot_addr2 are visible to the other CPUs. However, Intel
> explicitly documents writes to the x2APIC as non-serializing
> instructions. That means they can execute out-of-order. Which means
> that the first pseudocode fragment could execute as:
>
> send_ipi();
> tlb_shoot_addr1 = sva;
> tlb_shoot_addr2 = eva;
>
> And that means the IPI handler on the other CPU might see the *old*
> values for tlb_shoot_addr1 and tlb_shoot_addr2. Linux does an
> "mfence; lfence" before the MSR write that sends an IPI. I learned
> all of this from the following stackoverflow:
>
>
> https://stackoverflow.com/questions/76352933/will-memory-write-be-visible-after-sending-an-ipi-on-x86
>
> There may be better sources though.
>
> Note that on AMD CPUs the x2APIC MSR writes are serializing
> instructions. Also note that we don't use the x2APIC on OpenBSD
> unless the BIOS has it enabled already at boot (or if we're running
> under a hypervisor). That is probably why we haven't seen this issue
> pop up a lot. My suspicion is that this Dell R440 has the x2APIC at
> boot.
>
> Here is an updated version of the diff that prints the x2APIC state
> such that we can verify this hypothesis.
Thanks! I'll apply your diff and report here -- I'm still running
tests with Theo's pmap.c/vector.S diff.
>
>
> Index: arch/amd64/amd64/lapic.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
> diff -u -p -r1.76 lapic.c
> --- arch/amd64/amd64/lapic.c 12 Nov 2025 09:48:52 -0000 1.76
> +++ arch/amd64/amd64/lapic.c 26 Dec 2025 12:40:06 -0000
> @@ -235,6 +235,8 @@ lapic_map(paddr_t lapic_base)
> (uint64_t)va, (uint64_t)lapic_base);
>
> intr_restore(s);
> +
> + printf("%s: x2apic_enabled %d\n", __func__, x2apic_enabled);
> }
>
> /*
> @@ -691,6 +693,7 @@ x2apic_ipi(int vec, int target, int dl)
>
> lo = (target & LAPIC_DEST_MASK) | vec | dl | LAPIC_LVL_ASSERT;
>
> + __asm volatile("mfence; lfence" ::: "memory");
> x2apic_writeicr(hi, lo);
> }
>