Hi Wen,
On Fri, 2026-05-22 at 01:40 +0800, Wen Yang wrote:
> Hi Gabriele,
>
> No specific reason for REL_SOFT — not intentional, reverting to
> REL_HARD.
>
> Reproduced the stall on the same config (PREEMPT_RT +
> PROVE_LOCKING/PROVE_RCU).
>
> Root cause is a cleanup ordering bug in uprobe_detail_waiting.tc,
> unrelated to REL_SOFT/REL_HARD:
>
>
> # original cleanup — wrong order
>
> echo "-${UPROBE_TARGET}:${start_offset}" > "$TLOB_MONITOR" # (A)
>
> kill "$hog_pid" # (B)
>
>
> (A)
> triggers synchronize_srcu() in the kernel. But tlob_target is stuck
> mid-uprobe_notify_resume holding an SRCU read lock, preempted by the
> FIFO-99 hog -> so the reader never finishes and (B) is never reached.
> rcuc/0 (a kthread on PREEMPT_RT) is also starved by the hog -> RCU stall.
great you found the issue and solution. Wonder why lockdep wasn't more
informative, but probably the issue was so frequent to hog that too.
> Fix: kill the hog first:
>
>
> kill "$hog_pid"; wait "$hog_pid"
>
> echo "-${UPROBE_TARGET}:${start_offset}" > "$TLOB_MONITOR"
>
>
> On the PREEMPT_RT it is more reliably triggered there because rcuc/0
> runs as a preemptible kthread rather than in softirq, making it easier
> for the hog to monopolise the CPU long enough to hit the stall.
>
> Thank you for the thorough review and valuable suggestions. We are
> working through all of them and running the full test suite.
> We expect to post v3 within the next two days.
Alright, sounds good.
Thanks,
Gabriele