On Tue, 3 Feb 2026 18:20:22 -0800, [email protected] wrote: > On Mon, Feb 02, 2026 at 05:13:34PM +0800, Li Zhe wrote: > > In the current KLP transition implementation, the strategy for running > > tasks relies on waiting for a context switch to attempt to clear the > > TIF_PATCH_PENDING flag. Alternatively, determine whether the > > TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the > > process has yielded the CPU. However, this approach proves problematic > > in certain environments. > > > > Consider a scenario where the majority of system CPUs are configured > > with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned > > to that physical core and configured with idle=poll within the guest. > > Under such conditions, these vCPUs rarely leave the CPU. Combined with > > the high core counts typical of modern server platforms, this results > > in transition completion times that are not only excessively prolonged > > but also highly unpredictable. > > > > This patch resolves this issue by registering a callback with > > stop_machine. The callback attempts to transition the associated running > > task. In a VM environment configured with 32 CPUs, the live patching > > operation completes promptly after the SIGNALS_TIMEOUT period with this > > patch applied; without it, the process nearly fails to complete under > > the same scenario. > > > > Co-developed-by: Rui Qi <[email protected]> > > Signed-off-by: Rui Qi <[email protected]> > > Signed-off-by: Li Zhe <[email protected]> > > PeterZ, what's your take on this? > > I wonder if we could instead do resched_cpu() or something similar to > trigger the call to klp_sched_try_switch() in __schedule()?
klp_sched_try_switch() only invokes __klp_sched_try_switch() after verifying that the corresponding task has the TASK_FREEZABLE flag set. I remain uncertain whether this approach adequately resolves the issue. Thanks, Zhe

