On 19/11/25 06:31, Andy Lutomirski wrote: > On Fri, Nov 14, 2025, at 7:14 AM, Valentin Schneider wrote: >> Deferring kernel range TLB flushes requires the guarantee that upon >> entering the kernel, no stale entry may be accessed. The simplest way to >> provide such a guarantee is to issue an unconditional flush upon switching >> to the kernel CR3, as this is the pivoting point where such stale entries >> may be accessed. >> > > Doing this together with the PTI CR3 switch has no actual benefit: MOV CR3 > doesn’t flush global pages. And doing this in asm is pretty gross. We don’t > even get a free sync_core() out of it because INVPCID is not documented as > being serializing. > > Why can’t we do it in C? What’s the actual risk? In order to trip over a > stale TLB entry, we would need to deference a pointer to newly allocated > kernel virtual memory that was not valid prior to our entry into user mode. I > can imagine BPF doing this, but plain noinstr C in the entry path? > Especially noinstr C *that has RCU disabled*? We already can’t follow an RCU > pointer, and ISTM the only style of kernel code that might do this would use > RCU to protect the pointer, and we are already doomed if we follow an RCU > pointer to any sort of memory. >
So v4 and earlier had the TLB flush faff done in C in the context_tracking entry just like sync_core(). My biggest issue with it was that I couldn't figure out a way to instrument memory accesses such that I would get an idea of where vmalloc'd accesses happen - even with a hackish thing just to survey the landscape. So while I agree with your reasoning wrt entry noinstr code, I don't have any way to prove it. That's unlike the text_poke sync_core() deferral for which I have all of that nice objtool instrumentation. Dave also pointed out that the whole stale entry flush deferral is a risky move, and that the sanest thing would be to execute the deferred flush just after switching to the kernel CR3. See the thread surrounding: https://lore.kernel.org/lkml/[email protected]/ mainly Dave's reply and subthread: https://lore.kernel.org/lkml/[email protected]/ > We do need to watch out for NMI/MCE hitting before we flush.
