> > On 5/9/2025 3:07 PM, Joel Fernandes wrote: > > > > > > On 5/7/2025 7:26 AM, Zqiang wrote: > >> In the preparation stage of CPU online, if the corresponding > >> the rdp's->nocb_cb_kthread does not exist, will be created, > >> there is a situation where the rdp's rcuop kthreads creation fails, > >> and then de-offload this CPU's rdp, does not assign this CPU's > >> rdp->nocb_cb_kthread pointer, but this rdp's->nocb_gp_rdp and > >> rdp's->rdp_gp->nocb_gp_kthread is still valid. > > Maybe I mixed up what you're doing but this commit message is a bit hard to > parse. You're saying in the problematic scenario, "rdp's->nocb_gp_rdp" is > valid, > but in the patch below you're checking for "rdp's->nocb_gp_rdp". So I am a bit > confused, why you would not run into the same problematic scenario. I think I > agree with your patch but it would be good to refine and clarify the > problematic > condition, the commit message, and also please add some comments to the code > :).
Hello, Joel Maybe my description is not clear enough. The scenario I want to describe is: This is a small probability event,the rdp->nocb_cb_kthread creation may fail in rcu_spawn_cpu_nocb_kthread(), this rdp->nocb_cb_kthread will not set,is null pointer. but this rdp->rdp_gp->nocb_gp_kthread and rdp->nocb_gp_rdp pointer is valid. but in rcu_nocb_rdp_offload(), we only check rdp->nocb_gp_rdp and rdp->rdp_gp->nocb_gp_kthread, this will cause kthread_unpark() access null rdp->nocb_cb_kthread pointer. so I use rdp->nocb_gp_kthread condition to make a judgment, if the rdp->nocb_gp_kthread is valid, this means that rdp->nocb_cb_kthread is also valid and there is no kthreads creation failure in rcu_spawn_cpu_nocb_kthread() Thanks Zqiang > > Do you have a reproducer for this? If not, maybe we can work on some test > cases > Both CPU hotplug and offload is tested by rcutorture, so I'd expect we run > into > it. But perhaps not, because it requires kthread creation to fail? > > thanks, > > - Joel > > > >> > >> This will cause the subsequent re-offload operation of this offline > >> CPU, which will pass the conditional check and the kthread_unpark() > >> will access invalid rdp's->nocb_cb_kthread pointer. > >> > >> This commit therefore use rdp's->nocb_gp_kthread instead of > >> rdp_gp's->nocb_gp_kthread for safety check. > >> > >> Signed-off-by: Zqiang <qiang.zhang1...@gmail.com> > > > > Is it possible that rdp_gp->nocb_gp_kthread is valid, but > > rdp->nocb_cb_kthread > > is still invalid (because its creation failed?). This seems a bit fragile > > to me > > to assume that if rdp_gp->nocb_gp_kthread then rdp->nocb_cb_kthread is > > valid. Or > > is there a path that makes sure this invariant is always satisfied? If so, > > can > > we add additional warnings for checking this invariant? > > > > Also from the other thread, it sounds like there is more work to do here > > (related patches so I'd like to defer this to 6.17 - feel free to keep > > posting > > patches for this work though). > > > > Thanks! > > > > - Joel > > > >> --- > >> kernel/rcu/tree_nocb.h | 5 ++--- > >> 1 file changed, 2 insertions(+), 3 deletions(-) > >> > >> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > >> index 1596812f7f12..6679140bb0b5 100644 > >> --- a/kernel/rcu/tree_nocb.h > >> +++ b/kernel/rcu/tree_nocb.h > >> @@ -1146,7 +1146,6 @@ static bool rcu_nocb_rdp_offload_wait_cond(struct > >> rcu_data *rdp) > >> static int rcu_nocb_rdp_offload(struct rcu_data *rdp) > >> { > >> int wake_gp; > >> - struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > >> > >> WARN_ON_ONCE(cpu_online(rdp->cpu)); > >> /* > >> @@ -1156,7 +1155,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp) > >> if (!rdp->nocb_gp_rdp) > >> return -EINVAL; > >> > >> - if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread)) > >> + if (WARN_ON_ONCE(!rdp->nocb_gp_kthread)) > >> return -EINVAL; > >> > >> pr_info("Offloading %d\n", rdp->cpu); > >> @@ -1166,7 +1165,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp) > >> > >> wake_gp = rcu_nocb_queue_toggle_rdp(rdp); > >> if (wake_gp) > >> - wake_up_process(rdp_gp->nocb_gp_kthread); > >> + wake_up_process(rdp->nocb_gp_kthread); > >> > >> swait_event_exclusive(rdp->nocb_state_wq, > >> rcu_nocb_rdp_offload_wait_cond(rdp)); > > >