In __modify_ftrace_direct(), register_ftrace_function_nolock() makes
tmp_ops visible in ftrace_ops_list before entry->direct is updated
under ftrace_lock. During this window any CPU entering the traced
function calls call_direct_funcs(), reads the old address from
direct_functions via RCU, and jumps to it via
arch_ftrace_set_direct_caller(). If the caller freed or invalidated
the old trampoline before calling modify_ftrace_direct(), this is a
use-after-free in executable code context.
The race window:
CPU 0 (__modify_ftrace_direct) CPU 1 (executing traced func)
────────────────────────────── ──────────────────────────────
register_ftrace_function_nolock()
-> tmp_ops visible in ops_list
call_direct_funcs()
ftrace_find_rec_direct() -> old_addr
arch_ftrace_set_direct_caller(old_addr)
jump to old_addr <- UAF if freed
mutex_lock(&ftrace_lock)
entry->direct = addr <- too late
mutex_unlock(&ftrace_lock)
Fix: update entry->direct under ftrace_lock BEFORE registering tmp_ops.
Any CPU that observes tmp_ops in ftrace_ops_list after this point will
already see the new address when it calls ftrace_find_rec_direct().
Add smp_wmb() between the store and the registration to ensure the
write is visible on weakly-ordered architectures before tmp_ops
becomes observable via ftrace_ops_list.
On error from register_ftrace_function_nolock(), restore entry->direct
to old_addr since tmp_ops never became visible to other CPUs.
This affects all callers of __modify_ftrace_direct(), including:
- modify_ftrace_direct() used by kernel modules and live patching
- modify_ftrace_direct_nolock() used by BPF trampolines
(kernel/bpf/trampoline.c) reachable with CAP_BPF + CAP_PERFMON
Fixes: 0567d6809440 ("ftrace: Add modify_ftrace_direct()")
Cc: Steven Rostedt <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: [email protected]
Signed-off-by: Andrii Kuchmenko <[email protected]>
---
kernel/trace/ftrace.c | 35 +++++++++++++++++++++++++----------
1 file changed, 25 insertions(+), 10 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index a1b2c3d4e5f6..b7c8d9e0f1a2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5950,6 +5950,7 @@ static int __modify_ftrace_direct(struct ftrace_ops *ops,
unsigned long addr)
struct ftrace_func_entry *entry;
struct ftrace_ops tmp_ops;
+ unsigned long old_addr;
int err;
lockdep_assert_held(&direct_mutex);
@@ -5960,22 +5961,36 @@ static int __modify_ftrace_direct(struct ftrace_ops
*ops, unsigned long addr)
if (!entry)
return -ENODEV;
- /*
- * tmp_ops is registered into ftrace_ops_list here, making it
- * visible to all CPUs executing the traced function. However,
- * entry->direct is not updated until after this call returns,
- * leaving a window where CPUs read the stale (possibly freed)
- * direct call address via ftrace_find_rec_direct().
- */
- err = register_ftrace_function_nolock(&tmp_ops);
- if (err)
- return err;
-
+ /* Save old address in case we need to roll back on error. */
+ old_addr = entry->direct;
+
+ /*
+ * Update entry->direct BEFORE registering tmp_ops into
+ * ftrace_ops_list. This closes the race window where a CPU
+ * executing the traced function could read the old (potentially
+ * freed) direct call address between tmp_ops becoming visible
+ * and entry->direct being updated.
+ *
+ * Any CPU that observes tmp_ops in ftrace_ops_list after the
+ * smp_wmb() below is guaranteed to see the new address when
+ * it calls ftrace_find_rec_direct().
+ */
mutex_lock(&ftrace_lock);
entry->direct = addr;
mutex_unlock(&ftrace_lock);
+ /*
+ * Ensure entry->direct store is ordered before tmp_ops
+ * becomes visible via ftrace_ops_list on weakly-ordered archs.
+ */
+ smp_wmb();
+
+ err = register_ftrace_function_nolock(&tmp_ops);
+ if (err) {
+ /* tmp_ops never became visible; safe to restore old_addr. */
+ mutex_lock(&ftrace_lock);
+ entry->direct = old_addr;
+ mutex_unlock(&ftrace_lock);
+ return err;
+ }
+
/*
* Now that tmp_ops is registered and entry->direct is updated,
* unregister the original ops and clean up.
--
2.39.0