On 02/05/25 06:53, Dave Hansen wrote: > On 5/2/25 02:55, Valentin Schneider wrote: >> My gripe with that was having two separate mechanisms >> - super early entry around SWITCH_TO_KERNEL_CR3) >> - later entry at context tracking > > What do you mean by "later entry"? >
I meant the point at which the deferred operation is run in the current patches, i.e. ct_kernel_enter() - kernel entry from the PoV of context tracking. > All of the paths to enter the kernel from userspace have some > SWITCH_TO_KERNEL_CR3 variant. If they didn't, the userspace that they > entered from could have attacked the kernel with Meltdown. > > I'm theorizing that if this is _just_ about avoiding TLB flush IPIs that > you can get away with a single mechanism. So right now there would indeed be the TLB flush IPIs, but also the text_poke() ones (sync_core() after patching text). These are the two NOHZ-breaking IPIs that show up on my HP box, and that I also got reports for from folks using NOHZ_FULL + CPU isolation in production, mostly on SPR "edge enhanced" type of systems. There's been some other sources of IPIs that have been fixed with an ad-hoc solution - disable the mechanism for NOHZ_FULL CPUs or do it differently such that an IPI isn't required, e.g. https://lore.kernel.org/lkml/ZJtBrybavtb1x45V@tpad/ While I don't expect the list to grow much, it's unfortunately not just the TLB flush IPIs.