On Fri, Apr 10, 2026 at 12:38 PM Masami Hiramatsu <[email protected]> wrote: > > Hi Yafang, > > On Thu, 2 Apr 2026 17:26:03 +0800 > Yafang Shao <[email protected]> wrote: > > > Livepatching allows for rapid experimentation with new kernel features > > without interrupting production workloads. However, static livepatches lack > > the flexibility required to tune features based on task-specific attributes, > > such as cgroup membership, which is critical in multi-tenant k8s > > environments. Furthermore, hardcoding logic into a livepatch prevents > > dynamic adjustments based on the runtime environment. > > > > To address this, we propose a hybrid approach using BPF. Our production use > > case involves: > > > > 1. Deploying a Livepatch function to serve as a stable BPF hook. > > > > 2. Utilizing bpf_override_return() to dynamically modify the return value > > of that hook based on the current task's context. > > First of all, I don't like this approach to test a new feature in the > kernel, because it sounds like allowing multiple different generations > of implementations to coexist simultaneously. The standard kernel code > is not designed to withstand such implementations.
However, this approach is invaluable for rapidly deploying new kernel features to production servers without downtime. Upgrading kernels across a large fleet remains a significant challenge. > > For example, if you implement a well-designed framework in a specific > subsystem, like Schedext, which allows multiple implementations extended > with BPF to coexist, there's no problem (at least it's debatable). > > But if it is for any function, it is dangerous feature. Bugs that occur > in kernels that use this functionality cannot be addressed here. They > need to be treated the same way as out-of-tree drivers or forked kernels. > I mean, add a tainted flag for this feature. And we don't care of it. Agreed. This should be handled as an OOT module rather than part of the core kernel. > > > > > A significant challenge arises when atomic-replace is enabled. In this > > mode, deploying a new livepatch changes the target function's address, > > forcing a re-attachment of the BPF program. This re-attachment latency is > > unacceptable in critical paths, such as those handling networking policies. > > > > To solve this, we introduce a hybrid livepatch mode that allows specific > > patches to remain non-replaceable, ensuring the function address remains > > stable and the BPF program stays attached. > > Can you share your actual problem to be solved? Here is an example we recently deployed on our production servers: https://lore.kernel.org/bpf/caloahbdnnba_w_nwh3-s9gaxw0+vkulth1gy5hy9yqgeo4c...@mail.gmail.com/ In one of our specific clusters, we needed to send BGP traffic out through specific NICs based on the destination IP. To achieve this without interrupting service, we live-patched bond_xmit_3ad_xor_slave_get(), added a new hook called bond_get_slave_hook(), and then ran a BPF program attached to that hook to select the outgoing NIC from the SKB. This allowed us to rapidly deploy the feature with zero downtime. [...] -- Regards Yafang
