> On Jan 7, 2026, at 6:15 PM, Frederic Weisbecker <[email protected]> wrote:
> 
> Le Thu, Jan 01, 2026 at 11:34:10AM -0500, Joel Fernandes a écrit :
>> From: Yao Kai <[email protected]>
>> 
>> Commit 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in
>> __rcu_read_unlock()") removes the recursion-protection code from
>> __rcu_read_unlock(). Therefore, we could invoke the deadloop in
>> raise_softirq_irqoff() with ftrace enabled as follows:
>> 
>> WARNING: CPU: 0 PID: 0 at kernel/trace/trace.c:3021 
>> __ftrace_trace_stack.constprop.0+0x172/0x180
>> Modules linked in: my_irq_work(O)
>> CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G O 6.18.0-rc7-dirty #23 
>> PREEMPT(full)
>> Tainted: [O]=OOT_MODULE
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 
>> 04/01/2014
>> RIP: 0010:__ftrace_trace_stack.constprop.0+0x172/0x180
>> RSP: 0018:ffffc900000034a8 EFLAGS: 00010002
>> RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
>> RDX: 0000000000000003 RSI: ffffffff826d7b87 RDI: ffffffff826e9329
>> RBP: 0000000000090009 R08: 0000000000000005 R09: ffffffff82afbc4c
>> R10: 0000000000000008 R11: 0000000000011d7a R12: 0000000000000000
>> R13: ffff888003874100 R14: 0000000000000003 R15: ffff8880038c1054
>> FS:  0000000000000000(0000) GS:ffff8880fa8ea000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 000055b31fa7f540 CR3: 00000000078f4005 CR4: 0000000000770ef0
>> PKRU: 55555554
>> Call Trace:
>> <IRQ>
>> trace_buffer_unlock_commit_regs+0x6d/0x220
>> trace_event_buffer_commit+0x5c/0x260
>> trace_event_raw_event_softirq+0x47/0x80
>> raise_softirq_irqoff+0x6e/0xa0
>> rcu_read_unlock_special+0xb1/0x160
>> unwind_next_frame+0x203/0x9b0
>> __unwind_start+0x15d/0x1c0
>> arch_stack_walk+0x62/0xf0
>> stack_trace_save+0x48/0x70
>> __ftrace_trace_stack.constprop.0+0x144/0x180
>> trace_buffer_unlock_commit_regs+0x6d/0x220
>> trace_event_buffer_commit+0x5c/0x260
>> trace_event_raw_event_softirq+0x47/0x80
>> raise_softirq_irqoff+0x6e/0xa0
>> rcu_read_unlock_special+0xb1/0x160
>> unwind_next_frame+0x203/0x9b0
>> __unwind_start+0x15d/0x1c0
>> arch_stack_walk+0x62/0xf0
>> stack_trace_save+0x48/0x70
>> __ftrace_trace_stack.constprop.0+0x144/0x180
>> trace_buffer_unlock_commit_regs+0x6d/0x220
>> trace_event_buffer_commit+0x5c/0x260
>> trace_event_raw_event_softirq+0x47/0x80
>> raise_softirq_irqoff+0x6e/0xa0
>> rcu_read_unlock_special+0xb1/0x160
>> unwind_next_frame+0x203/0x9b0
>> __unwind_start+0x15d/0x1c0
>> arch_stack_walk+0x62/0xf0
>> stack_trace_save+0x48/0x70
>> __ftrace_trace_stack.constprop.0+0x144/0x180
>> trace_buffer_unlock_commit_regs+0x6d/0x220
>> trace_event_buffer_commit+0x5c/0x260
>> trace_event_raw_event_softirq+0x47/0x80
>> raise_softirq_irqoff+0x6e/0xa0
>> rcu_read_unlock_special+0xb1/0x160
>> __is_insn_slot_addr+0x54/0x70
>> kernel_text_address+0x48/0xc0
>> __kernel_text_address+0xd/0x40
>> unwind_get_return_address+0x1e/0x40
>> arch_stack_walk+0x9c/0xf0
>> stack_trace_save+0x48/0x70
>> __ftrace_trace_stack.constprop.0+0x144/0x180
>> trace_buffer_unlock_commit_regs+0x6d/0x220
>> trace_event_buffer_commit+0x5c/0x260
>> trace_event_raw_event_softirq+0x47/0x80
>> __raise_softirq_irqoff+0x61/0x80
>> __flush_smp_call_function_queue+0x115/0x420
>> __sysvec_call_function_single+0x17/0xb0
>> sysvec_call_function_single+0x8c/0xc0
>> </IRQ>
>> 
>> Commit b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
>> fixed the infinite loop in rcu_read_unlock_special() for IRQ work by
>> setting a flag before calling irq_work_queue_on(). We fix this issue by
>> setting the same flag before calling raise_softirq_irqoff() and rename the
>> flag to defer_qs_pending for more common.
>> 
>> Fixes: 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in 
>> __rcu_read_unlock()")
>> Reported-by: Tengda Wu <[email protected]>
>> Signed-off-by: Yao Kai <[email protected]>
>> Reviewed-by: Joel Fernandes <[email protected]>
>> Signed-off-by: Joel Fernandes <[email protected]>
> 
> Looks good but, BTW, what happens if rcu_qs() is called
> before rcu_preempt_deferred_qs() had a chance to be called?

Could you provide an example of when that can happen? 

If rcu_qs() results in reporting of a quiescent state up the node tree before 
the deferred reporting work had a chance to act, then indeed we should be 
clearing the flag (after canceling the pending raise_softirq_irqoff()).

>> flag to defer_qs_pending for more common.
>> 
>> Fixes: 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in 
>> __rcu_read_unlock()")
>> Reported-by: Tengda Wu <[email protected]>
>> Signed-off-by: Yao Kai <[email protected]>
>> Reviewed-by: Joel Fernandes <[email protected]>
>> Signed-off-by: Joel Fernandes <[email protected]>
> 
> Looks good but, BTW, what happens if rcu_qs() is called
> before rcu_preempt_deferred_qs() had a chance to be called?

Could you provide an example of when that can happen? 

As far as I can see, even if that were to happen, which I think you are right 
it can happen, we will still go through the path to report deferred quiescent 
states and cancel the pending work (reset the flag).

> current->rcu_read_unlock_special.b.need_qs is reset by rcu_qs()
> so subsequent calls to rcu_read_unlock() won't issue rcu_read_unlock_special()
> (unless the task is blocked). And further calls to rcu_preempt_deferred_qs()
> through rcu_core() will be ignored as well.

I am not sure if this implies that deferred quiescent state gets cancelled 
because we have already called unlock once. We have to go through the deferred 
quiescent state path on all subsequent quiescent state reporting, even if 
need_qs reset. How else will the GP complete.
> 
> But rdp->defer_qs_pending will remain in the DEFER_QS_PENDING state until
> the next grace period. And if rcu_read_unlock_special() is called again
> during the next GP on unfortunate place needing deferred qs, the state machine
> will spuriously assume that either rcu_core or the irq_work are pending, when
> none are anymore.
> 
> The state should be reset by rcu_qs().

In fact I would say if a deferred QS is pending, we should absolutely not reset 
its state from rcu_qs..

Maybe we should reset it from rcu_report_qs_rdp/rnp?

Unfortunately, all of this is coming from me being on a phone and not at a 
computer, so I will revise my response, but probably tomorrow, because today 
the human body is not cooperating. 

thanks,

 - Joel


> current->rcu_read_unlock_special.b.need_qs is reset by rcu_qs()
> so subsequent calls to rcu_read_unlock() won't issue rcu_read_unlock_special()
> (unless the task is blocked). And further calls to rcu_preempt_deferred_qs()
> through rcu_core() will be ignored as well.

I am not sure if this implies that deferred quiescent state gets cancelled 
because we have already called unlock once. We have to go through the deferred 
quiescent state path on all subsequent quiescent state reporting, even if 
need_qs reset. How else will the GP complete.
> 
> But rdp->defer_qs_pending will remain in the DEFER_QS_PENDING state until
> the next grace period. And if rcu_read_unlock_special() is called again
> during the next GP on unfortunate place needing deferred qs, the state machine
> will spuriously assume that either rcu_core or the irq_work are pending, when
> none are anymore.
> 
> The state should be reset by rcu_qs().

In fact I would say if a deferred QS is pending, we should absolutely not reset 
its state from rcu_qs..

Maybe we should reset it from rcu_report_qs_rdp/rnp?

thanks,

 - Joel


> 
> Thanks.
> 
> --
> Frederic Weisbecker
> SUSE Labs
> 

Reply via email to