Hi Wen,

On Fri, 2026-05-22 at 01:40 +0800, Wen Yang wrote:
> Hi Gabriele, 
> 
> No specific reason for REL_SOFT — not intentional, reverting to 
> REL_HARD.
>  
> Reproduced the stall on the same config (PREEMPT_RT + 
> PROVE_LOCKING/PROVE_RCU). 
> 
> Root cause is a cleanup ordering bug in uprobe_detail_waiting.tc, 
> unrelated to REL_SOFT/REL_HARD:
>  
> 
>    # original cleanup — wrong order 
> 
>    echo "-${UPROBE_TARGET}:${start_offset}" > "$TLOB_MONITOR"  # (A) 
> 
>    kill "$hog_pid"                                              # (B) 
> 
>  
>                                                               (A) 
> triggers synchronize_srcu() in the kernel. But tlob_target is stuck 
> mid-uprobe_notify_resume holding an SRCU read lock, preempted by the 
> FIFO-99 hog -> so the reader never finishes and (B) is never reached. 
> rcuc/0 (a kthread on PREEMPT_RT) is also starved by the hog -> RCU stall.

great you found the issue and solution. Wonder why lockdep wasn't more
informative, but probably the issue was so frequent to hog that too.


>    Fix: kill the hog first: 
>  
> 
>    kill "$hog_pid"; wait "$hog_pid" 
> 
>    echo "-${UPROBE_TARGET}:${start_offset}" > "$TLOB_MONITOR" 
> 
> 
> On the PREEMPT_RT it is more reliably triggered there because rcuc/0
> runs as a preemptible kthread rather than in softirq, making it easier
> for the hog to monopolise the CPU long enough to hit the stall.
> 
> Thank you for the thorough review and valuable suggestions. We are 
> working through all of them and running the full test suite.
> We expect to post v3 within the next two days.

Alright, sounds good.

Thanks,
Gabriele


Reply via email to