On 5/11/26 19:54, Uladzislau Rezki (Sony) wrote:
> From: "Paul E. McKenney" <[email protected]>
> 
> While an srcu_struct structure is in the midst of switching from CPU-0
> to all-CPUs state, it can attempt to invoke callbacks for CPUs that
> have never been online.  Worse yet, it can attempt in invoke callbacks
> for CPUs that never will be online, even including imaginary CPUs not in
> cpu_possible_mask.  This can cause hangs on s390,

Uladzislau, Paul, according to the fixes tag below this change fixes a
change that went into 7.0-rc6 -- and apparently causes a "hang" on some
architectures. So shouldn't this be heading to mainline instead of
-next? Ideally with a stable tag to ensure backporting to 7.0.y, but
that is a separate decision?

I had an eye on this issue after noticing Samir's report:
https://lore.kernel.org/lkml/[email protected]/

And the jury is still out, but Jiri is dealing with some issues that
might or might not be related to the problem this fixes, too:
https://lore.kernel.org/all/[email protected]/

Ciao, Thorsten


> which is not set up to
> deal with workqueue handlers being scheduled on such CPUs.  This commit
> therefore causes Tree SRCU to refrain from queueing workqueue handlers
> on CPUs that have not yet (and might never) come online.
> 
> Because callbacks are not invoked on CPUs that have not been
> online, it is an error to invoke call_srcu(), synchronize_srcu(), or
> synchronize_srcu_expedited() on a CPU that is not yet fully online.
> However, it turns out to be less code to redirect the callbacks
> from too-early invocations of call_srcu() than to warn about such
> invocations.  This commit therefore also redirects callbacks queued on
> not-yet-fully-online CPUs to the boot CPU.
> 
> Reported-by: Vasily Gorbik <[email protected]>
> Fixes: 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when 
> non-preemptible")
> Signed-off-by: Paul E. McKenney <[email protected]>
> Tested-by: Vasily Gorbik <[email protected]>
> Tested-by: Samir <[email protected]>
> Reviewed-by: Shrikanth Hegde <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
> ---
>  kernel/rcu/srcutree.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 0d01cd8c4b4a..7c2f7cc131f7 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -897,11 +897,9 @@ static void srcu_schedule_cbs_snp(struct srcu_struct 
> *ssp, struct srcu_node *snp
>  {
>       int cpu;
>  
> -     for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> -             if (!(mask & (1UL << (cpu - snp->grplo))))
> -                     continue;
> -             srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> -     }
> +     for (cpu = snp->grplo; cpu <= snp->grphi; cpu++)
> +             if ((mask & (1UL << (cpu - snp->grplo))) && 
> rcu_cpu_beenfullyonline(cpu))
> +                     srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), 
> delay);
>  }
>  
>  /*
> @@ -1322,7 +1320,9 @@ static unsigned long srcu_gp_start_if_needed(struct 
> srcu_struct *ssp,
>        */
>       idx = __srcu_read_lock_nmisafe(ssp);
>       ss_state = smp_load_acquire(&ssp->srcu_sup->srcu_size_state);
> -     if (ss_state < SRCU_SIZE_WAIT_CALL)
> +     // If !rcu_cpu_beenfullyonline(), interrupts are still disabled,
> +     // so no migration is possible in either direction from this CPU.
> +     if (ss_state < SRCU_SIZE_WAIT_CALL || 
> !rcu_cpu_beenfullyonline(raw_smp_processor_id()))
>               sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
>       else
>               sdp = raw_cpu_ptr(ssp->sda);


Reply via email to