On Fri, Apr 10, 2026 at 12:38 PM Masami Hiramatsu <[email protected]> wrote:
>
> Hi Yafang,
>
> On Thu,  2 Apr 2026 17:26:03 +0800
> Yafang Shao <[email protected]> wrote:
>
> > Livepatching allows for rapid experimentation with new kernel features
> > without interrupting production workloads. However, static livepatches lack
> > the flexibility required to tune features based on task-specific attributes,
> > such as cgroup membership, which is critical in multi-tenant k8s
> > environments. Furthermore, hardcoding logic into a livepatch prevents
> > dynamic adjustments based on the runtime environment.
> >
> > To address this, we propose a hybrid approach using BPF. Our production use
> > case involves:
> >
> > 1. Deploying a Livepatch function to serve as a stable BPF hook.
> >
> > 2. Utilizing bpf_override_return() to dynamically modify the return value
> >    of that hook based on the current task's context.
>
> First of all, I don't like this approach to test a new feature in the
> kernel, because it sounds like allowing multiple different generations
> of implementations to coexist simultaneously. The standard kernel code
> is not designed to withstand such implementations.

However, this approach is invaluable for rapidly deploying new kernel
features to production servers without downtime. Upgrading kernels
across a large fleet remains a significant challenge.

>
> For example, if you implement a well-designed framework in a specific
> subsystem, like Schedext, which allows multiple implementations extended
> with BPF to coexist, there's no problem (at least it's debatable).
>
> But if it is for any function, it is dangerous feature. Bugs that occur
> in kernels that use this functionality cannot be addressed here. They
> need to be treated the same way as out-of-tree drivers or forked kernels.
> I mean, add a tainted flag for this feature. And we don't care of it.

Agreed. This should be handled as an OOT module rather than part of
the core kernel.

>
> >
> > A significant challenge arises when atomic-replace is enabled. In this
> > mode, deploying a new livepatch changes the target function's address,
> > forcing a re-attachment of the BPF program. This re-attachment latency is
> > unacceptable in critical paths, such as those handling networking policies.
> >
> > To solve this, we introduce a hybrid livepatch mode that allows specific
> > patches to remain non-replaceable, ensuring the function address remains
> > stable and the BPF program stays attached.
>
> Can you share your actual problem to be solved?

Here is an example we recently deployed on our production servers:

  
https://lore.kernel.org/bpf/caloahbdnnba_w_nwh3-s9gaxw0+vkulth1gy5hy9yqgeo4c...@mail.gmail.com/

In one of our specific clusters, we needed to send BGP traffic out
through specific NICs based on the destination IP. To achieve this
without interrupting service, we live-patched
bond_xmit_3ad_xor_slave_get(), added a new hook called
bond_get_slave_hook(), and then ran a BPF program attached to that
hook to select the outgoing NIC from the SKB. This allowed us to
rapidly deploy the feature with zero downtime.

[...]

-- 
Regards
Yafang

Reply via email to