Hello BPF and livepatch teams,
This is somewhat a followup on
https://lists.ubuntu.com/archives/kernel-team/2025-October/163881.html
as we continue encounter issues and conflicts between BPF and livepatch.
We've encountered an issue between BPF fentry/fexit trampolines and
kernel livepatching (kpatch/livepatch) on x86_64 systems with ORC
unwinder enabled. I'm reaching out to understand if this is a known
limitation and to explore potential solutions. I assume it's known as I
see information along this lines in
https://www.kernel.org/doc/Documentation/livepatch/reliable-stacktrace.rst
Problem Summary
When BPF programs attach to kernel functions using fentry/fexit hooks,
the resulting JIT-compiled trampolines lack ORC unwind metadata. This
causes livepatch transition stall when threads are blocked in hooked
functions, as the stack becomes unreliable for unwinding purposes.
In our case the environment is
- RHEL 9.6 (kernel 5.14.0-570.17.1.el9_6.x86_64)
- CONFIG_UNWINDER_ORC=y
- CONFIG_BPF_JIT_ALWAYS_ON=y
- BPF fentry/fexit hooks on inet_recvmsg()
Scenario:
1. BPF program attached to inet_recvmsg via fentry/fexit (creates BPF
trampoline)
2. CIFS filesystem mounted (creates cifsd kernel thread)
3. cifsd thread blocks in inet_recvmsg → BPF trampoline is on the stack
4. Attempt to load kpatch module
5. Livepatch transition stalls indefinitely
Error Message (repeated every ~1 second):
livepatch: klp_try_switch_task: cifsd:2886 has an unreliable stack
Stack trace showing BPF trampoline:
cifsd D 0 2886
Call Trace:
wait_woken+0x50/0x60
sk_wait_data+0x176/0x190
tcp_recvmsg_locked+0x234/0x920
tcp_recvmsg+0x78/0x210
inet_recvmsg+0x5c/0x140
bpf_trampoline_6442469985+0x89/0x130 ← NO ORC metadata
sock_recvmsg+0x95/0xa0
cifs_readv_from_socket+0x1ca/0x2d0 [cifs]
...
As far as I understand and please correct me if it's wrong -
The failure occurs in arch/x86/kernel/unwind_orc.c
orc = orc_find(state->signal ? state->ip : state->ip - 1);
if (!orc) {
/*
* As a fallback, try to assume this code uses a frame pointer.
* This is useful for generated code, like BPF, which ORC
* doesn't know about. This is just a guess, so the rest of
* the unwind is no longer considered reliable.
*/
orc = &orc_fp_entry;
state->error = true; // ← Marks stack as unreliable
}
When orc_find() returns NULL for the BPF trampoline address, the
unwinder falls back to frame pointers and marks the stack unreliable.
This causes arch_stack_walk_reliable() to fail, which in turn causes
livepatch's klp_check_stack() to return -EINVAL before even checking if
to-be-patched functions are on the stack.
Key observations:
1. The kernel comment explicitly mentions "generated code, like BPF"
2. Documentation/livepatch/reliable-stacktrace.rst lists "Dynamically
generated code (e.g. eBPF)" as causing unreliable stacks
3. Native kernel functions have ORC metadata from objtool during build
4. Ftrace trampolines have special ORC handling via orc_ftrace_find()
5. BPF JIT trampolines have no such handling - Is this correct ?
Impact
This affects production systems where:
- Security/observability tools use BPF fentry/fexit hooks
- Live kernel patching is required for security updates
- Kernel threads may be blocked in hooked network/storage functions
The livepatch transition can stall for 60+ seconds before failing,
blocking critical security patches.
Questions for the Community
1. Is this a known limitation (I assume yes) ?
2. Runtime ORC generation? Could the BPF JIT generate ORC unwind entries
for trampolines, similar to how ftrace trampolines are handled?
3. Trampoline registration? Could BPF trampolines register their address
ranges with the ORC unwinder to avoid the "unreliable" marking?
4. Alternative unwinding? Could livepatch use an alternative unwinding
method when BPF trampolines are detected (e.g., frame pointers with
validation)?
5. Workarounds? I mention one bellow and I would be happy to hear if
anyone has a better idea to propose ?
The only possible workaround I see is switching everything from
trampoline based hooks to kprobe since I assume kprobes won't have this
issue
BPF kprobes use the ftrace infrastructure with kprobe_ftrace_handler,
which has ORC metadata and special handling in the unwinder. The stack
remains reliable:
inet_recvmsg+0x50/0x140 ← Has ORC metadata
kprobe_ftrace_handler+... ← Has ORC metadata
Problem with kprobes is obviously their performance penalty.
Additional Context
From arch/x86/net/bpf_jit_comp.c:3559:
bool bpf_jit_supports_exceptions(void)
{
/* We unwind through both kernel frames (starting from within bpf_throw
* call) and BPF frames. Therefore we require ORC unwinder to be
enabled
* to walk kernel frames and reach BPF frames in the stack trace.
*/
return IS_ENABLED(CONFIG_UNWINDER_ORC);
}
This shows that BPF already has some integration with ORC for exception
handling. Could this be extended to trampolines?
References
- Kernel: 5.14.0-570.17.1.el9_6.x86_64
- Code: arch/x86/kernel/unwind_orc.c:510-519
- Docs: Documentation/livepatch/reliable-stacktrace.rst lines 84-85, 111-112
I appreciate any guidance on whether this is something that could be
addressed in the kernel, or if we should focus on user-space workarounds.
Thanks,
Andrey