On Mon, 2 Feb 2026 17:13:34 +0800, [email protected] wrote: > In the current KLP transition implementation, the strategy for running > tasks relies on waiting for a context switch to attempt to clear the > TIF_PATCH_PENDING flag. Alternatively, determine whether the > TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the > process has yielded the CPU. However, this approach proves problematic > in certain environments. > > Consider a scenario where the majority of system CPUs are configured > with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned > to that physical core and configured with idle=poll within the guest. > Under such conditions, these vCPUs rarely leave the CPU. Combined with > the high core counts typical of modern server platforms, this results > in transition completion times that are not only excessively prolonged > but also highly unpredictable. > > This patch resolves this issue by registering a callback with > stop_machine. The callback attempts to transition the associated running > task. In a VM environment configured with 32 CPUs, the live patching > operation completes promptly after the SIGNALS_TIMEOUT period with this > patch applied; without it, the process nearly fails to complete under > the same scenario. > > Co-developed-by: Rui Qi <[email protected]> > Signed-off-by: Rui Qi <[email protected]> > Signed-off-by: Li Zhe <[email protected]> > --- > kernel/livepatch/transition.c | 62 ++++++++++++++++++++++++++++++++--- > 1 file changed, 58 insertions(+), 4 deletions(-) > > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c > index 2351a19ac2a9..9c078b9bd755 100644 > --- a/kernel/livepatch/transition.c > +++ b/kernel/livepatch/transition.c > @@ -10,6 +10,7 @@ > #include <linux/cpu.h> > #include <linux/stacktrace.h> > #include <linux/static_call.h> > +#include <linux/stop_machine.h> > #include "core.h" > #include "patch.h" > #include "transition.h" > @@ -297,6 +298,61 @@ static int klp_check_and_switch_task(struct task_struct > *task, void *arg) > return 0; > } > > +enum klp_stop_work_bit { > + KLP_STOP_WORK_PENDING_BIT, > +}; > + > +struct klp_stop_work_info { > + struct task_struct *task; > + unsigned long flag; > +}; > + > +static DEFINE_PER_CPU(struct cpu_stop_work, klp_transition_stop_work); > +static DEFINE_PER_CPU(struct klp_stop_work_info, klp_stop_work_info); > + > +static int klp_check_task(struct task_struct *task, void *old_name) > +{ > + if (task == current) > + return klp_check_and_switch_task(current, old_name); > + else > + return task_call_func(task, klp_check_and_switch_task, > old_name); > +} > + > +static int klp_transition_stop_work_fn(void *arg) > +{ > + struct klp_stop_work_info *info = (struct klp_stop_work_info *)arg; > + struct task_struct *task = info->task; > + const char *old_name; > + > + clear_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag); > + > + if (likely(klp_patch_pending(task))) > + klp_check_task(task, &old_name); > + > + put_task_struct(task); > + > + return 0; > +} > + > +static void klp_try_transition_running_task(struct task_struct *task) > +{ > + int cpu = task_cpu(task); > + > + if (klp_signals_cnt && !(klp_signals_cnt % SIGNALS_TIMEOUT)) { > + struct klp_stop_work_info *info = > + per_cpu_ptr(&klp_stop_work_info, cpu); > + > + if (test_and_set_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag)) > + return; > + > + info->task = get_task_struct(task); > + if (!stop_one_cpu_nowait(cpu, klp_transition_stop_work_fn, info, > + per_cpu_ptr(&klp_transition_stop_work, > + cpu))) > + put_task_struct(task); > + } > +} > + > /* > * Try to safely switch a task to the target patch state. If it's currently > * running, or it's sleeping on a to-be-patched or to-be-unpatched function, > or > @@ -323,10 +379,7 @@ static bool klp_try_switch_task(struct task_struct *task) > * functions. If all goes well, switch the task to the target patch > * state. > */ > - if (task == current) > - ret = klp_check_and_switch_task(current, &old_name); > - else > - ret = task_call_func(task, klp_check_and_switch_task, > &old_name); > + ret = klp_check_task(task, &old_name); > > switch (ret) { > case 0: /* success */ > @@ -335,6 +388,7 @@ static bool klp_try_switch_task(struct task_struct *task) > case -EBUSY: /* klp_check_and_switch_task() */ > pr_debug("%s: %s:%d is running\n", > __func__, task->comm, task->pid); > + klp_try_transition_running_task(task); > break; > case -EINVAL: /* klp_check_and_switch_task() */ > pr_debug("%s: %s:%d has an unreliable stack\n", > -- > 2.20.1
Hi all, Just a gentle ping on this patch. Please let me know if there's anything I can improve or if you need more information. Thanks, Zhe

