Re: [mainline][BUG] Observed Workqueue lockups on offline CPUs.

Samir M Mon, 27 Apr 2026 04:34:51 -0700


On 27/04/26 3:32 pm, Samir M wrote:

Hi Paul,
I've been testing the latest upstream kernel on a PowerPC system andencountered workqueue lockup issues that I've bisected to commit61bbcfb50514 ("srcu: Push srcu_node allocation to GP whennon-preemptible").After booting, I'm seeing workqueue lockup warnings for CPUs 81-96,which are offline on my system. The workqueues remain stuck for over237 seconds:
[ 243.309302][ C0] BUG: workqueue lockup - pool cpus=81 node=0flags=0x4 nice=0 stuck for 237s![ 243.309311][ C0] BUG: workqueue lockup - pool cpus=82 node=0flags=0x4 nice=0 stuck for 237s![ 243.309318][ C0] BUG: workqueue lockup - pool cpus=83 node=0flags=0x4 nice=0 stuck for 237s![ 243.309326][ C0] BUG: workqueue lockup - pool cpus=84 node=0flags=0x4 nice=0 stuck for 237s![ 243.309333][ C0] BUG: workqueue lockup - pool cpus=85 node=0flags=0x4 nice=0 stuck for 237s![ 243.309341][ C0] BUG: workqueue lockup - pool cpus=86 node=0flags=0x4 nice=0 stuck for 237s![ 243.309348][ C0] BUG: workqueue lockup - pool cpus=87 node=0flags=0x4 nice=0 stuck for 237s![ 243.309355][ C0] BUG: workqueue lockup - pool cpus=88 node=0flags=0x4 nice=0 stuck for 237s![ 243.309363][ C0] BUG: workqueue lockup - pool cpus=89 node=0flags=0x4 nice=0 stuck for 237s![ 243.309370][ C0] BUG: workqueue lockup - pool cpus=90 node=0flags=0x4 nice=0 stuck for 237s![ 243.309377][ C0] BUG: workqueue lockup - pool cpus=91 node=0flags=0x4 nice=0 stuck for 237s![ 243.309384][ C0] BUG: workqueue lockup - pool cpus=92 node=0flags=0x4 nice=0 stuck for 237s![ 243.309392][ C0] BUG: workqueue lockup - pool cpus=93 node=0flags=0x4 nice=0 stuck for 237s![ 243.309399][ C0] BUG: workqueue lockup - pool cpus=94 node=0flags=0x4 nice=0 stuck for 237s![ 243.309406][ C0] BUG: workqueue lockup - pool cpus=95 node=0flags=0x4 nice=0 stuck for 237s![ 243.309413][ C0] BUG: workqueue lockup - pool cpus=96 node=0flags=0x4 nice=0 stuck for 237s!
Git bisect identified this as the first bad commit:

commit 61bbcfb50514a8a94e035a7349697a3790ab4783
Author: Paul E. McKenney <[email protected]>
Date:   Fri Mar 20 20:29:20 2026 -0700

    srcu: Push srcu_node allocation to GP when non-preemptible

    When the srcutree.convert_to_big and srcutree.big_cpu_lim kernel boot
    parameters specify initialization-time allocation of the srcu_node
    tree for statically allocated srcu_struct structures (for example, in
DEFINE_SRCU() at build time instead of init_srcu_struct() atruntime), init_srcu_struct_nodes() will attempt to dynamically allocate thistree
    at the first run-time update-side use of this srcu_struct structure,
    but while holding a raw spinlock. Because the memory allocator can
    acquire non-raw spinlocks, this can result in lockdep splats.
This commit therefore uses the same SRCU_SIZE_ALLOC trick that isused
    when the first run-time update-side use of this srcu_struct structure
happens before srcu_init() is called. The actual allocation thentakes place from workqueue context at the ends of upcoming SRCU graceperiods.
    [boqun: Adjust the sha1 of the Fixes tag]
Fixes: 175b45ed343a ("srcu: Use raw spinlocks so call_srcu() canbe used under preempt_disable()")
    Signed-off-by: Paul E. McKenney <[email protected]>
    Signed-off-by: Boqun Feng <[email protected]>

 kernel/rcu/srcutree.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Reverting this commit resolves the issue.
The problem appears to be that the workqueue is attempting to executeon offline CPUs. The commit moves SRCU node allocation to workqueuecontext to avoid lockdep issues with memory allocation under rawspinlocks, which makes sense. However, it seems the workqueuescheduling doesn't properly account for CPU online/offline state inthis code path.
My test environment:
- Architecture: PowerPC
- Kernel version: Latest upstream (7.1-rc1)
- CPUs 81-96 are offline at boot time

I suspect the issue might be related to:
1. Workqueue not checking CPU online status before scheduling SRCUallocation work2. Missing CPU hotplug awareness in the new workqueue-based allocationpath
3. Possible race condition with CPU hotplug events
Would it make sense to use queue_work_on() with explicit online CPUselection, or add CPU hotplug handlers for this workqueue? I'm notdeeply familiar with the workqueue internals, so I might be missingsomething.Please let me know if you need any additional details or if you'd likeme to test any patches.
If you happen to fix the above issue, then please add below tag.
Reported-by: Samir M <[email protected]>


Thanks,
Samir


Hi Paul,

I worked on fixing the issue and introduced the changes below. Withthese updates, I no longer observe any workqueue lockup messages foroffline CPUs.

Could you please review the changes and share your feedback?

The commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
non-preemptible") introduced workqueue lockups on systems with offline
CPUs. The issue occurs because srcu_queue_delayed_work_on() calls
queue_work_on() with sdp->cpu, which may be offline, causing the
workqueue to spin indefinitely on that CPU.

This patch fixes the issue by checking if the target CPU is online
before queuing work on it. If the CPU is offline, we fall back to
using queue_work() which will schedule the work on any available
online CPU.

Fixes: 61bbcfb50514 ("srcu: Push srcu_node allocation to GP whennon-preemptible")


Signed-off-by: Samir <[email protected]>
---
 kernel/rcu/srcutree.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 0d01cd8c4b4a..55a90dd4a030 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -869,10 +869,15 @@ static void srcu_delay_timer(struct timer_list *t)
 static void srcu_queue_delayed_work_on(struct srcu_data *sdp,
 unsigned long delay)
 {
-       if (!delay) {
+       if (!delay && cpu_online(sdp->cpu)) {
                queue_work_on(sdp->cpu, rcu_gp_wq, &sdp->work);
                return;
+       } else if (!delay) {
+               /* CPU is offline, queue on any available CPU */
+               queue_work(rcu_gp_wq, &sdp->work);
+               return;
+       }

        timer_reduce(&sdp->delay_work, jiffies + delay);
 }
--


Thanks,
Samir

Re: [mainline][BUG] Observed Workqueue lockups on offline CPUs.

Reply via email to