On Thu 2025-02-27 10:47:33, Yafang Shao wrote: > The tasklist_lock in the KLP transition might result high latency under > specific workload. We can replace it with RCU. > > After a new task is forked, its kernel stack is always set to empty[0]. > Therefore, we can init these new tasks to KLP_TRANSITION_IDLE state > after they are forked. If these tasks are forked during the KLP > transition but before klp_check_and_switch_task(), they can be safely > skipped during klp_check_and_switch_task(). Additionally, if > klp_ftrace_handler() is triggered right after forking, the task can > determine which function to use based on the klp_target_state. > > With the above change, we can safely convert the tasklist_lock to RCU. > > Link: https://lore.kernel.org/all/20250213173253.ovivhuq2c5rmvkhj@jpoimboe/ > [0] > Link: https://lore.kernel.org/all/20250214181206.xkvxohoc4ft26uhf@jpoimboe/ > [1] > Suggested-by: Josh Poimboeuf <jpoim...@kernel.org> > Signed-off-by: Yafang Shao <laoar.s...@gmail.com> > --- > --- a/kernel/livepatch/patch.c > +++ b/kernel/livepatch/patch.c > @@ -95,7 +95,13 @@ static void notrace klp_ftrace_handler(unsigned long ip, > > patch_state = current->patch_state; > > - WARN_ON_ONCE(patch_state == KLP_TRANSITION_IDLE); > + /* If the patch_state is KLP_TRANSITION_IDLE, it means the task > + * was forked after klp_init_transition(). In this case, the > + * newly forked task can determine which function to use based > + * on the klp_target_state. > + */ > + if (patch_state == KLP_TRANSITION_IDLE) > + patch_state = klp_target_state; >
CPU0 CPU1 task_0 (forked process): funcA klp_ftrace_handler() if (patch_state == KLP_TRANSITION_IDLE) patch_state = klp_target_state # set to KLP_TRANSITION_PATCHED # redirected to klp_funcA() echo 0 >/sys/kernel/livepatch/patch1/enabled klp_reverse_transition() klp_target_state = !klp_target_state; # set to KLP_TRANSITION_UNPATCHED funcB if (patch_state == KLP_TRANSITION_IDLE) patch_state = klp_target_state # set to KLP_TRANSITION_UNPATCHED # staying in origianl funcB BANG: livepatched and original function called at the same time => broken consistency model. BTW: This is what I tried to warn you about at https://lore.kernel.org/r/z69wuhve2vnsr...@pathway.suse.cz See below for more: > if (patch_state == KLP_TRANSITION_UNPATCHED) { > /* > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c > index ba069459c101..af76defca67a 100644 > --- a/kernel/livepatch/transition.c > +++ b/kernel/livepatch/transition.c > @@ -23,7 +23,7 @@ static DEFINE_PER_CPU(unsigned long[MAX_STACK_ENTRIES], > klp_stack_entries); > > struct klp_patch *klp_transition_patch; > > -static int klp_target_state = KLP_TRANSITION_IDLE; > +int klp_target_state = KLP_TRANSITION_IDLE; > > static unsigned int klp_signals_cnt; > > @@ -294,6 +294,14 @@ static int klp_check_and_switch_task(struct task_struct > *task, void *arg) > { > int ret; > > + /* > + * If the patch_state remains KLP_TRANSITION_IDLE at this point, it > + * indicates that the task was forked after klp_init_transition(). > + * In this case, it is safe to skip the task. > + */ > + if (!test_tsk_thread_flag(task, TIF_PATCH_PENDING)) > + return 0; > + > if (task_curr(task) && task != current) > return -EBUSY; > > @@ -466,11 +474,11 @@ void klp_try_complete_transition(void) > * Usually this will transition most (or all) of the tasks on a system > * unless the patch includes changes to a very common function. > */ > - read_lock(&tasklist_lock); > + rcu_read_lock(); > for_each_process_thread(g, task) > if (!klp_try_switch_task(task)) > complete = false; > - read_unlock(&tasklist_lock); > + rcu_read_unlock(); > > /* > * Ditto for the idle "swapper" tasks. > @@ -694,25 +702,10 @@ void klp_reverse_transition(void) > } > > /* Called from copy_process() during fork */ > -void klp_copy_process(struct task_struct *child) > +void klp_init_process(struct task_struct *child) > { > - > - /* > - * The parent process may have gone through a KLP transition since > - * the thread flag was copied in setup_thread_stack earlier. Bring > - * the task flag up to date with the parent here. > - * > - * The operation is serialized against all klp_*_transition() > - * operations by the tasklist_lock. The only exceptions are > - * klp_update_patch_state(current) and __klp_sched_try_switch(), but we > - * cannot race with them because we are current. > - */ > - if (test_tsk_thread_flag(current, TIF_PATCH_PENDING)) > - set_tsk_thread_flag(child, TIF_PATCH_PENDING); > - else > - clear_tsk_thread_flag(child, TIF_PATCH_PENDING); > - > - child->patch_state = current->patch_state; > + clear_tsk_thread_flag(child, TIF_PATCH_PENDING); > + child->patch_state = KLP_TRANSITION_IDLE; I thought that we might do: child->patch_state = klp_target_state; to avoid the race in the klp_ftrace_handler() described above. But the following might happen: CPU0 CPU1 klp_init_process(child) child->patch_state = KLP_TRANSITION_IDLE klp_enable_patch() # setup ftrace handlers # initialize all processes # in the task list # add "child" into the task list schedule() [running child now] funcA() klp_ftrace_handler() child->patch_state = KLP_TRANSITION_IDLE BANG: Same problem as with the original patch. Hmm, the 2nd version of this patchset tried to do: diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c index 90408500e5a3..5e523a3fbb3c 100644 --- a/kernel/livepatch/patch.c +++ b/kernel/livepatch/patch.c @@ -95,7 +95,12 @@ static void notrace klp_ftrace_handler(unsigned long ip, patch_state = current->patch_state; - WARN_ON_ONCE(patch_state == KLP_TRANSITION_IDLE); + /* If the patch_state is KLP_TRANSITION_IDLE, it indicates the + * task was forked after klp_init_transition(). For this newly + * forked task, it is safe to switch it to klp_target_state. + */ + if (patch_state == KLP_TRANSITION_IDLE) + current->patch_state = klp_target_state; if (patch_state == KLP_TRANSITION_UNPATCHED) { /* Note: It is broken. It sets current->patch_state but it later checks the local variable "patch_state" which is still KLP_TRANSITION_IDLE. But is is safe when we fix it? It might be as long as we run klp_synchronize_transition() between updating the global @klp_target_state and the other operations. Especially, we need to make sure that @klp_target_state always have either KLP_TRANSITION_PATCHED or KLP_TRANSITION_UNPATCHED when func->transition is set. For this, we would need to add klp_synchronize_transition() into + klp_init_transition() between setting @klp_target_state and func->transition = true. + klp_complete_transition() also for KLP_TRANSITION_UNPATCHED way. It is currently called only for the PATCHED target state. Will this be enough? FACT: It is more complicated than it looked. QUESTION: Is this worth the effort? NOTE: The rcu_read_lock() does not solve the reported stall. We are spending time on it only because it looked nice and simple to you. My opinion: I would personally prefer to focus on solving the stall and the use-after-free access in do_exit(). Best Regards, Petr