Hi Tengda,
Sashiko [1] reviewed this patch and found that removing the
task_is_running() check exposes stack unwinders to real crashes — not
just "invalid information." A PoC confirms this: a KASAN panic triggers
within seconds when /proc/<pid>/stack reads the stack of a task that is
concurrently running a kretprobe.
[1]
https://sashiko.dev/#/patchset/20260610013658.1837963-1-wutengda%40huaweicloud.com
> diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
> index 5a8bdf88999a..1e7fdebe3cd5 100644
> --- a/kernel/trace/rethook.c
> +++ b/kernel/trace/rethook.c
> @@ -250,9 +251,6 @@ unsigned long rethook_find_ret_addr(struct
task_struct *tsk, unsigned long frame
> if (WARN_ON_ONCE(!cur))
> return 0;
>
> - if (tsk != current && task_is_running(tsk))
> - return 0;
> -
> do {
> ret = __rethook_find_ret_addr(tsk, cur);
> if (!ret)
The commit message states:
> The iteration is already safe from crashes because
> unwind_next_frame() holds RCU and rethook_node structures are
> RCU-freed; even if the iteration goes off the rails and returns
> invalid information, it will not crash.
There are two problems with this claim, both reproducible.
**Problem 1: stack-out-of-bounds in unwind_next_frame itself**
The PoC below reliably triggers the following KASAN panic — not in the
rethook list traversal, but inside unwind_next_frame():
[ 1833.494623] BUG: KASAN: stack-out-of-bounds in
unwind_next_frame+0x861/0x2080
[ 1833.494651] Read of size 2 at addr ffffc90003e6f5f0 by task poc/9854
[ 1833.494707] Call Trace:
[ 1833.494719] dump_stack_lvl+0x116/0x1f0
[ 1833.494743] print_report+0xf4/0x600
[ 1833.494788] kasan_report+0xe0/0x110
[ 1833.494836] unwind_next_frame+0x861/0x2080
[ 1833.494948] arch_stack_walk+0x99/0x100
[ 1833.495000] stack_trace_save_tsk+0x16a/0x200
[ 1833.495054] proc_pid_stack+0x173/0x2b0
[ 1833.495103] seq_read_iter+0x519/0x12d0
[ 1833.495166] seq_read+0x3b7/0x590
[ 1833.495297] vfs_read+0x1f5/0xd20
[ 1833.495497] ksys_read+0x135/0x250
[ 1833.495549] do_syscall_64+0x129/0x850
[ 1833.495566] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 1833.498894] Kernel panic - not syncing: KASAN: panic_on_warn set ...
page last free pid 9737 tgid 9737 stack trace:
do_sys_openat2+0xbf/0x260 <-- target task inside kretprobe
__x64_sys_openat+0x179/0x210
This crash has nothing to do with rethook_node lifetimes or RCU. It
happens because the ORC unwinder reads stack memory while the target
task concurrently executes a kretprobe trampoline that modifies return
addresses. The unwinder follows corrupted frame data past valid stack
boundaries. RCU protection of rethook_node structures is irrelevant —
this crash occurs at the stack frame interpretation level, before any
rethook list traversal.
The old task_is_running() check prevented the unwinder from attempting
to unwind a running task's stack in the first place.
**Problem 2: use-after-free via rethook_node recycling**
Even if the stack-out-of-bounds above were addressed, a second crash
path exists in the rethook list traversal itself.
rethook_recycle() immediately pushes nodes back to the objpool without
an RCU grace period:
kernel/trace/rethook.c:
void rethook_recycle(struct rethook_node *node)
{
...
objpool_push(node, &node->rethook->pool);
}
Meanwhile, unwind_next_frame() in arch/x86/kernel/unwind_orc.c drops
RCU between frames while the cursor (*cur) persists across iterations:
arch/x86/kernel/unwind_orc.c:
bool unwind_next_frame(...)
{
...
guard(rcu)(); // RCU held for one frame
...
} // RCU dropped here
When the unwinder calls __rethook_find_ret_addr() in the next frame
iteration, it does:
struct llist_node *first = tsk->rethooks.first;
...
*cur = first;
...
node = node->next; // node may have been recycled
If the target task returns from a probed function between frames, its
rethook_node is recycled and can be instantly reallocated to another
task. The unwinder's stale cursor then dereferences a freed pointer,
leading to use-after-free.
## Reproducer
The PoC sets up a kretprobe on do_sys_openat2, creates hot-loop threads
calling open(), and concurrently reads /proc/<tid>/stack. The race
triggers within seconds (Problem 1 above; Problem 2 may reproduce on
kernels without KASAN or with different timing).
Build: gcc -static -pthread -o poc poc.c
Run: ./poc [runtime_seconds]
Needs: root, CONFIG_KASAN=y
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <sys/syscall.h>
#include <sched.h>
#include <fcntl.h>
#include <errno.h>
#include <signal.h>
#include <pthread.h>
#include <dirent.h>
#define TRACE "/sys/kernel/tracing"
volatile int stop = 0;
static int tfs(const char *f, const char *b)
{
char p[256]; int fd, r;
snprintf(p, 256, "%s/%s", TRACE, f);
fd = open(p, O_WRONLY | O_TRUNC);
if (fd < 0) {
system("mount -t tracefs tracefs /sys/kernel/tracing 2>/dev/null");
usleep(50000);
fd = open(p, O_WRONLY | O_TRUNC);
}
if (fd < 0) return -1;
r = write(fd, b, strlen(b));
close(fd);
return r < 0 ? -1 : 0;
}
void *hot_thread(void *arg)
{
while (!__atomic_load_n(&stop, __ATOMIC_RELAXED)) {
int fd = open("/dev/null", O_RDONLY);
if (fd >= 0) close(fd);
}
return NULL;
}
void *reader_thread(void *arg)
{
pid_t target = *(pid_t *)arg;
char path[64], buf[8192];
snprintf(path, 64, "/proc/%d/stack", target);
while (!__atomic_load_n(&stop, __ATOMIC_RELAXED)) {
int fd = open(path, O_RDONLY);
if (fd >= 0) { read(fd, buf, 8191); close(fd); }
}
return NULL;
}
void sigh(int s) { stop = 1; }
int main(int argc, char *argv[])
{
int runtime = 120;
if (argc > 1) runtime = atoi(argv[1]);
printf("rethook race PoC\n");
if (geteuid()) { printf("root needed\n"); return 1; }
signal(SIGINT, sigh);
pthread_t hot[4], rdr[4];
pid_t hot_tids[4];
int pairs = 4;
for (int c = 0; c < runtime / 5 && !stop; c++) {
tfs("events/kprobes/myretprobe/enable", "0");
tfs("kprobe_events", "-:myretprobe");
usleep(100);
tfs("kprobe_events", "r:myretprobe do_sys_openat2 $retval");
tfs("events/kprobes/myretprobe/enable", "1");
pid_t main_tid = syscall(SYS_gettid);
for (int i = 0; i < pairs; i++)
pthread_create(&hot[i], NULL, hot_thread, NULL);
usleep(300000);
{
DIR *d = opendir("/proc/self/task");
int cnt = 0;
if (d) {
struct dirent *de;
while ((de = readdir(d)) != NULL && cnt < pairs) {
pid_t t = atoi(de->d_name);
if (t > 0 && t != main_tid)
hot_tids[cnt++] = t;
}
closedir(d);
}
for (int i = 0; i < cnt; i++)
pthread_create(&rdr[i], NULL, reader_thread, &hot_tids[i]);
}
printf("round %d\n", c);
sleep(5);
stop = 1;
usleep(100000);
for (int i = 0; i < pairs; i++) pthread_join(hot[i], NULL);
for (int i = 0; i < pairs; i++) pthread_join(rdr[i], NULL);
stop = 0;
usleep(1000);
}
tfs("events/kprobes/myretprobe/enable", "0");
tfs("kprobe_events", "-:myretprobe");
printf("Done\n");
return 0;
}
## Summary
The v4 commit message claims the iteration "will not crash," but the PoC
demonstrates a reproducible KASAN panic:
1. stack-out-of-bounds in unwind_next_frame (ORC unwinder reads
concurrently-modified stack frames of a running task)
2. Potential use-after-free in __rethook_find_ret_addr (rethook nodes
recycled without RCU grace period, cursor persists across RCU drops)
The old task_is_running() check was racy but served as a practical
safety net. Removing it without adding equivalent protection in the
callers (proc_pid_stack, BPF stack walkers) exposes users to kernel
panics via /proc/<pid>/stack on any task running a kretprobe.
Thanks,
Xiao