Re: [RFC PATCH 0/4] trace, livepatch: Allow kprobe return overriding for livepatched functions

Song Liu Mon, 06 Apr 2026 19:47:06 -0700

On Mon, Apr 6, 2026 at 7:22 PM Yafang Shao <[email protected]> wrote:
>
> On Tue, Apr 7, 2026 at 2:26 AM Song Liu <[email protected]> wrote:
> >
> > On Mon, Apr 6, 2026 at 3:55 AM Yafang Shao <[email protected]> wrote:
> > >
> > > On Sat, Apr 4, 2026 at 12:07 AM Song Liu <[email protected]> wrote:
> > > >
> > > > Hi Yafang,
> > > >
> > > > On Thu, Apr 2, 2026 at 2:26 AM Yafang Shao <[email protected]> wrote:
> > > > >
> > > > > Livepatching allows for rapid experimentation with new kernel features
> > > > > without interrupting production workloads. However, static 
> > > > > livepatches lack
> > > > > the flexibility required to tune features based on task-specific 
> > > > > attributes,
> > > > > such as cgroup membership, which is critical in multi-tenant k8s
> > > > > environments. Furthermore, hardcoding logic into a livepatch prevents
> > > > > dynamic adjustments based on the runtime environment.
> > > > >
> > > > > To address this, we propose a hybrid approach using BPF. Our 
> > > > > production use
> > > > > case involves:
> > > > >
> > > > > 1. Deploying a Livepatch function to serve as a stable BPF hook.
> > > > >
> > > > > 2. Utilizing bpf_override_return() to dynamically modify the return 
> > > > > value
> > > > >    of that hook based on the current task's context.
> > > >
> > > > Could you please provide a specific use case that can benefit from this?
> > > > AFAICT, livepatch is more flexible but risky (may cause crash); while
> > > > BPF is safe, but less flexible. The combination you are proposing seems
> > > > to get the worse of the two sides. Maybe it can indeed get the benefit 
> > > > of
> > > > both sides in some cases, but I cannot think of such examples.
> > > >
> > >
> > > Here is an example we recently deployed on our production servers:
> > >
> > >   
> > > https://lore.kernel.org/bpf/caloahbdnnba_w_nwh3-s9gaxw0+vkulth1gy5hy9yqgeo4c...@mail.gmail.com/
> > >
> > > In one of our specific clusters, we needed to send BGP traffic out
> > > through specific NICs based on the destination IP. To achieve this
> > > without interrupting service, we live-patched
> > > bond_xmit_3ad_xor_slave_get(), added a new hook called
> > > bond_get_slave_hook(), and then ran a BPF program attached to that
> > > hook to select the outgoing NIC from the SKB. This allowed us to
> > > rapidly deploy the feature with zero downtime.
> >
> > I guess the idea here is: keep the risk part simple, and implement
> > it in module/livepatch, then use BPF for the flexible and programmable
> > part safe.
>
> Right
>
> >
> > Can we use struct_ops instead of bpf_override_return for this case?
> > This should make the solution more flexible.
>
> Upstreaming struct_ops based BPF hooks is a challenging process, as
> seen in these examples:
>
>   https://lwn.net/Articles/1054030/
>   https://lwn.net/Articles/1043548/
>
> Even when successful, upstreaming can take a significant amount of
> time—often longer than our production requirements allow. To bridge
> this gap, we developed this livepatch+BPF solution. This allows us to
> rapidly deploy new features without maintaining custom hooks in our
> local kernel. Because these livepatch-based hooks are lightweight,
> they minimize maintenance overhead and simplify kernel upgrades (e.g.,
> from 6.1 to 6.18).


I didn't mean upstream struct_ops.

We can define the struct_ops in an OOT kernel module. Then we
can attach BPF programs to the struct_ops. We may need
livepatch to connect the new struct_ops to original kernel logic.

I think kernel side of this solution is mostly available, but we may
need some work on the toolchain side.

Does this make sense?

Thanks,
Song

> That said, we would still prefer to have our hooks accepted upstream
> to eliminate the need for self-maintenance entirely.

Re: [RFC PATCH 0/4] trace, livepatch: Allow kprobe return overriding for livepatched functions

Reply via email to