On 19/11/25 06:31, Andy Lutomirski wrote:
> On Fri, Nov 14, 2025, at 7:14 AM, Valentin Schneider wrote:
>> Deferring kernel range TLB flushes requires the guarantee that upon
>> entering the kernel, no stale entry may be accessed. The simplest way to
>> provide such a guarantee is to issue an unconditional flush upon switching
>> to the kernel CR3, as this is the pivoting point where such stale entries
>> may be accessed.
>>
>
> Doing this together with the PTI CR3 switch has no actual benefit: MOV CR3 
> doesn’t flush global pages. And doing this in asm is pretty gross.  We don’t 
> even get a free sync_core() out of it because INVPCID is not documented as 
> being serializing.
>
> Why can’t we do it in C?  What’s the actual risk?  In order to trip over a 
> stale TLB entry, we would need to deference a pointer to newly allocated 
> kernel virtual memory that was not valid prior to our entry into user mode. I 
> can imagine BPF doing this, but plain noinstr C in the entry path?  
> Especially noinstr C *that has RCU disabled*?  We already can’t follow an RCU 
> pointer, and ISTM the only style of kernel code that might do this would use 
> RCU to protect the pointer, and we are already doomed if we follow an RCU 
> pointer to any sort of memory.
>

So v4 and earlier had the TLB flush faff done in C in the context_tracking entry
just like sync_core().

My biggest issue with it was that I couldn't figure out a way to instrument
memory accesses such that I would get an idea of where vmalloc'd accesses
happen - even with a hackish thing just to survey the landscape. So while I
agree with your reasoning wrt entry noinstr code, I don't have any way to
prove it.
That's unlike the text_poke sync_core() deferral for which I have all of
that nice objtool instrumentation.

Dave also pointed out that the whole stale entry flush deferral is a risky
move, and that the sanest thing would be to execute the deferred flush just
after switching to the kernel CR3.

See the thread surrounding:
  https://lore.kernel.org/lkml/[email protected]/

mainly Dave's reply and subthread:
  https://lore.kernel.org/lkml/[email protected]/

> We do need to watch out for NMI/MCE hitting before we flush.


Reply via email to