On Mon, Apr 6, 2026 at 3:55 AM Yafang Shao <[email protected]> wrote:
>
> On Sat, Apr 4, 2026 at 12:07 AM Song Liu <[email protected]> wrote:
> >
> > Hi Yafang,
> >
> > On Thu, Apr 2, 2026 at 2:26 AM Yafang Shao <[email protected]> wrote:
> > >
> > > Livepatching allows for rapid experimentation with new kernel features
> > > without interrupting production workloads. However, static livepatches 
> > > lack
> > > the flexibility required to tune features based on task-specific 
> > > attributes,
> > > such as cgroup membership, which is critical in multi-tenant k8s
> > > environments. Furthermore, hardcoding logic into a livepatch prevents
> > > dynamic adjustments based on the runtime environment.
> > >
> > > To address this, we propose a hybrid approach using BPF. Our production 
> > > use
> > > case involves:
> > >
> > > 1. Deploying a Livepatch function to serve as a stable BPF hook.
> > >
> > > 2. Utilizing bpf_override_return() to dynamically modify the return value
> > >    of that hook based on the current task's context.
> >
> > Could you please provide a specific use case that can benefit from this?
> > AFAICT, livepatch is more flexible but risky (may cause crash); while
> > BPF is safe, but less flexible. The combination you are proposing seems
> > to get the worse of the two sides. Maybe it can indeed get the benefit of
> > both sides in some cases, but I cannot think of such examples.
> >
>
> Here is an example we recently deployed on our production servers:
>
>   
> https://lore.kernel.org/bpf/caloahbdnnba_w_nwh3-s9gaxw0+vkulth1gy5hy9yqgeo4c...@mail.gmail.com/
>
> In one of our specific clusters, we needed to send BGP traffic out
> through specific NICs based on the destination IP. To achieve this
> without interrupting service, we live-patched
> bond_xmit_3ad_xor_slave_get(), added a new hook called
> bond_get_slave_hook(), and then ran a BPF program attached to that
> hook to select the outgoing NIC from the SKB. This allowed us to
> rapidly deploy the feature with zero downtime.

I guess the idea here is: keep the risk part simple, and implement
it in module/livepatch, then use BPF for the flexible and programmable
part safe.

Can we use struct_ops instead of bpf_override_return for this case?
This should make the solution more flexible.

Thanks,
Song

Reply via email to