Re: [RFC PATCH 0/4] trace, livepatch: Allow kprobe return overriding for livepatched functions

Yafang Shao Mon, 06 Apr 2026 19:22:43 -0700

On Tue, Apr 7, 2026 at 2:26 AM Song Liu <[email protected]> wrote:
>
> On Mon, Apr 6, 2026 at 3:55 AM Yafang Shao <[email protected]> wrote:
> >
> > On Sat, Apr 4, 2026 at 12:07 AM Song Liu <[email protected]> wrote:
> > >
> > > Hi Yafang,
> > >
> > > On Thu, Apr 2, 2026 at 2:26 AM Yafang Shao <[email protected]> wrote:
> > > >
> > > > Livepatching allows for rapid experimentation with new kernel features
> > > > without interrupting production workloads. However, static livepatches 
> > > > lack
> > > > the flexibility required to tune features based on task-specific 
> > > > attributes,
> > > > such as cgroup membership, which is critical in multi-tenant k8s
> > > > environments. Furthermore, hardcoding logic into a livepatch prevents
> > > > dynamic adjustments based on the runtime environment.
> > > >
> > > > To address this, we propose a hybrid approach using BPF. Our production 
> > > > use
> > > > case involves:
> > > >
> > > > 1. Deploying a Livepatch function to serve as a stable BPF hook.
> > > >
> > > > 2. Utilizing bpf_override_return() to dynamically modify the return 
> > > > value
> > > >    of that hook based on the current task's context.
> > >
> > > Could you please provide a specific use case that can benefit from this?
> > > AFAICT, livepatch is more flexible but risky (may cause crash); while
> > > BPF is safe, but less flexible. The combination you are proposing seems
> > > to get the worse of the two sides. Maybe it can indeed get the benefit of
> > > both sides in some cases, but I cannot think of such examples.
> > >
> >
> > Here is an example we recently deployed on our production servers:
> >
> >   
> > https://lore.kernel.org/bpf/caloahbdnnba_w_nwh3-s9gaxw0+vkulth1gy5hy9yqgeo4c...@mail.gmail.com/
> >
> > In one of our specific clusters, we needed to send BGP traffic out
> > through specific NICs based on the destination IP. To achieve this
> > without interrupting service, we live-patched
> > bond_xmit_3ad_xor_slave_get(), added a new hook called
> > bond_get_slave_hook(), and then ran a BPF program attached to that
> > hook to select the outgoing NIC from the SKB. This allowed us to
> > rapidly deploy the feature with zero downtime.
>
> I guess the idea here is: keep the risk part simple, and implement
> it in module/livepatch, then use BPF for the flexible and programmable
> part safe.


Right

>
> Can we use struct_ops instead of bpf_override_return for this case?
> This should make the solution more flexible.

Upstreaming struct_ops based BPF hooks is a challenging process, as
seen in these examples:

  https://lwn.net/Articles/1054030/
  https://lwn.net/Articles/1043548/

Even when successful, upstreaming can take a significant amount of
time—often longer than our production requirements allow. To bridge
this gap, we developed this livepatch+BPF solution. This allows us to
rapidly deploy new features without maintaining custom hooks in our
local kernel. Because these livepatch-based hooks are lightweight,
they minimize maintenance overhead and simplify kernel upgrades (e.g.,
from 6.1 to 6.18).

That said, we would still prefer to have our hooks accepted upstream
to eliminate the need for self-maintenance entirely.

-- 
Regards
Yafang

Re: [RFC PATCH 0/4] trace, livepatch: Allow kprobe return overriding for livepatched functions

Reply via email to