Hi Xiao,

Thank you very much for your detailed analysis and verification.

On 2026/6/11 23:53, XIAO WU wrote:
> Hi Tengda,
> 
> Sashiko [1] reviewed this patch and found that removing the
> task_is_running() check exposes stack unwinders to real crashes — not
> just "invalid information."  A PoC confirms this: a KASAN panic triggers
> within seconds when /proc/<pid>/stack reads the stack of a task that is
> concurrently running a kretprobe.
> 
> [1] 
> https://sashiko.dev/#/patchset/20260610013658.1837963-1-wutengda%40huaweicloud.com
> 
>> diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
>> index 5a8bdf88999a..1e7fdebe3cd5 100644
>> --- a/kernel/trace/rethook.c
>> +++ b/kernel/trace/rethook.c
>> @@ -250,9 +251,6 @@ unsigned long rethook_find_ret_addr(struct task_struct 
>> *tsk, unsigned long frame
>>      if (WARN_ON_ONCE(!cur))
>>          return 0;
>>
>> -    if (tsk != current && task_is_running(tsk))
>> -        return 0;
>> -
>>      do {
>>          ret = __rethook_find_ret_addr(tsk, cur);
>>          if (!ret)
> 
> The commit message states:
> 
>> The iteration is already safe from crashes because
>> unwind_next_frame() holds RCU and rethook_node structures are
>> RCU-freed; even if the iteration goes off the rails and returns
>> invalid information, it will not crash.
> 
> There are two problems with this claim, both reproducible.
> 
> **Problem 1: stack-out-of-bounds in unwind_next_frame itself**
> 
> The PoC below reliably triggers the following KASAN panic — not in the
> rethook list traversal, but inside unwind_next_frame():
> 
> [ 1833.494623] BUG: KASAN: stack-out-of-bounds in 
> unwind_next_frame+0x861/0x2080
> [ 1833.494651] Read of size 2 at addr ffffc90003e6f5f0 by task poc/9854
> [ 1833.494707] Call Trace:
> [ 1833.494719]  dump_stack_lvl+0x116/0x1f0
> [ 1833.494743]  print_report+0xf4/0x600
> [ 1833.494788]  kasan_report+0xe0/0x110
> [ 1833.494836]  unwind_next_frame+0x861/0x2080
> [ 1833.494948]  arch_stack_walk+0x99/0x100
> [ 1833.495000]  stack_trace_save_tsk+0x16a/0x200
> [ 1833.495054]  proc_pid_stack+0x173/0x2b0
> [ 1833.495103]  seq_read_iter+0x519/0x12d0
> [ 1833.495166]  seq_read+0x3b7/0x590
> [ 1833.495297]  vfs_read+0x1f5/0xd20
> [ 1833.495497]  ksys_read+0x135/0x250
> [ 1833.495549]  do_syscall_64+0x129/0x850
> [ 1833.495566]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [ 1833.498894] Kernel panic - not syncing: KASAN: panic_on_warn set ...
> 
> page last free pid 9737 tgid 9737 stack trace:
>  do_sys_openat2+0xbf/0x260          <-- target task inside kretprobe
>  __x64_sys_openat+0x179/0x210
> 
> This crash has nothing to do with rethook_node lifetimes or RCU.  It
> happens because the ORC unwinder reads stack memory while the target
> task concurrently executes a kretprobe trampoline that modifies return
> addresses.  The unwinder follows corrupted frame data past valid stack
> boundaries.  RCU protection of rethook_node structures is irrelevant —
> this crash occurs at the stack frame interpretation level, before any
> rethook list traversal.
> 
> The old task_is_running() check prevented the unwinder from attempting
> to unwind a running task's stack in the first place.
> 

Yes, I was able to reproduce the issue locally using your PoC. The problem
does exist as you described. I need to take a deeper look and figure out how
to properly fix it.

> **Problem 2: use-after-free via rethook_node recycling**
> 
> Even if the stack-out-of-bounds above were addressed, a second crash
> path exists in the rethook list traversal itself.
> 
> rethook_recycle() immediately pushes nodes back to the objpool without
> an RCU grace period:
> 
>   kernel/trace/rethook.c:
>   void rethook_recycle(struct rethook_node *node)
>   {
>           ...
>           objpool_push(node, &node->rethook->pool);
>   }
> 
> Meanwhile, unwind_next_frame() in arch/x86/kernel/unwind_orc.c drops
> RCU between frames while the cursor (*cur) persists across iterations:
> 
>   arch/x86/kernel/unwind_orc.c:
>   bool unwind_next_frame(...)
>   {
>           ...
>           guard(rcu)();    // RCU held for one frame
>           ...
>   }                        // RCU dropped here
> 
> When the unwinder calls __rethook_find_ret_addr() in the next frame
> iteration, it does:
> 
>   struct llist_node *first = tsk->rethooks.first;
>   ...
>   *cur = first;
>   ...
>   node = node->next;       // node may have been recycled
> 
> If the target task returns from a probed function between frames, its
> rethook_node is recycled and can be instantly reallocated to another
> task.  The unwinder's stale cursor then dereferences a freed pointer,
> leading to use-after-free.
> 

Yes, Sashiko also pointed this out. You have opened this issue for further
analysis to clarify its fault model. It appears to be a pre-existing issue
that may require a separate patch to resolve.

> ## Reproducer
> 
> The PoC sets up a kretprobe on do_sys_openat2, creates hot-loop threads
> calling open(), and concurrently reads /proc/<tid>/stack.  The race
> triggers within seconds (Problem 1 above; Problem 2 may reproduce on
> kernels without KASAN or with different timing).
> 
> Build:  gcc -static -pthread -o poc poc.c
> Run:    ./poc [runtime_seconds]
> Needs:  root, CONFIG_KASAN=y
> 
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <sys/wait.h>
> #include <sys/syscall.h>
> #include <sched.h>
> #include <fcntl.h>
> #include <errno.h>
> #include <signal.h>
> #include <pthread.h>
> #include <dirent.h>
> 
> #define TRACE "/sys/kernel/tracing"
> 
> volatile int stop = 0;
> 
> static int tfs(const char *f, const char *b)
> {
>     char p[256]; int fd, r;
>     snprintf(p, 256, "%s/%s", TRACE, f);
>     fd = open(p, O_WRONLY | O_TRUNC);
>     if (fd < 0) {
>         system("mount -t tracefs tracefs /sys/kernel/tracing 2>/dev/null");
>         usleep(50000);
>         fd = open(p, O_WRONLY | O_TRUNC);
>     }
>     if (fd < 0) return -1;
>     r = write(fd, b, strlen(b));
>     close(fd);
>     return r < 0 ? -1 : 0;
> }
> 
> void *hot_thread(void *arg)
> {
>     while (!__atomic_load_n(&stop, __ATOMIC_RELAXED)) {
>         int fd = open("/dev/null", O_RDONLY);
>         if (fd >= 0) close(fd);
>     }
>     return NULL;
> }
> 
> void *reader_thread(void *arg)
> {
>     pid_t target = *(pid_t *)arg;
>     char path[64], buf[8192];
>     snprintf(path, 64, "/proc/%d/stack", target);
>     while (!__atomic_load_n(&stop, __ATOMIC_RELAXED)) {
>         int fd = open(path, O_RDONLY);
>         if (fd >= 0) { read(fd, buf, 8191); close(fd); }
>     }
>     return NULL;
> }
> 
> void sigh(int s) { stop = 1; }
> 
> int main(int argc, char *argv[])
> {
>     int runtime = 120;
>     if (argc > 1) runtime = atoi(argv[1]);
> 
>     printf("rethook race PoC\n");
>     if (geteuid()) { printf("root needed\n"); return 1; }
>     signal(SIGINT, sigh);
> 
>     pthread_t hot[4], rdr[4];
>     pid_t hot_tids[4];
>     int pairs = 4;
> 
>     for (int c = 0; c < runtime / 5 && !stop; c++) {
>         tfs("events/kprobes/myretprobe/enable", "0");
>         tfs("kprobe_events", "-:myretprobe");
>         usleep(100);
>         tfs("kprobe_events", "r:myretprobe do_sys_openat2 $retval");
>         tfs("events/kprobes/myretprobe/enable", "1");
> 
>         pid_t main_tid = syscall(SYS_gettid);
> 
>         for (int i = 0; i < pairs; i++)
>             pthread_create(&hot[i], NULL, hot_thread, NULL);
> 
>         usleep(300000);
> 
>         {
>             DIR *d = opendir("/proc/self/task");
>             int cnt = 0;
>             if (d) {
>                 struct dirent *de;
>                 while ((de = readdir(d)) != NULL && cnt < pairs) {
>                     pid_t t = atoi(de->d_name);
>                     if (t > 0 && t != main_tid)
>                         hot_tids[cnt++] = t;
>                 }
>                 closedir(d);
>             }
>             for (int i = 0; i < cnt; i++)
>                 pthread_create(&rdr[i], NULL, reader_thread, &hot_tids[i]);
>         }
> 
>         printf("round %d\n", c);
>         sleep(5);
> 
>         stop = 1;
>         usleep(100000);
> 
>         for (int i = 0; i < pairs; i++) pthread_join(hot[i], NULL);
>         for (int i = 0; i < pairs; i++) pthread_join(rdr[i], NULL);
> 
>         stop = 0;
>         usleep(1000);
>     }
> 
>     tfs("events/kprobes/myretprobe/enable", "0");
>     tfs("kprobe_events", "-:myretprobe");
>     printf("Done\n");
>     return 0;
> }
> 
> ## Summary
> 
> The v4 commit message claims the iteration "will not crash," but the PoC
> demonstrates a reproducible KASAN panic:
> 
> 1. stack-out-of-bounds in unwind_next_frame (ORC unwinder reads
>    concurrently-modified stack frames of a running task)
> 
> 2. Potential use-after-free in __rethook_find_ret_addr (rethook nodes
>    recycled without RCU grace period, cursor persists across RCU drops)
> 
> The old task_is_running() check was racy but served as a practical
> safety net.  Removing it without adding equivalent protection in the
> callers (proc_pid_stack, BPF stack walkers) exposes users to kernel
> panics via /proc/<pid>/stack on any task running a kretprobe.

Once again, I truly appreciate your thorough review and testing.
I'm not sure if I can fully resolve these issues, but if I succeed, I will
send out a v5.

Best regards,
Tengda


Reply via email to