On Tue, Apr 7, 2026 at 10:47 AM Song Liu <[email protected]> wrote: > > On Mon, Apr 6, 2026 at 7:22 PM Yafang Shao <[email protected]> wrote: > > > > On Tue, Apr 7, 2026 at 2:26 AM Song Liu <[email protected]> wrote: > > > > > > On Mon, Apr 6, 2026 at 3:55 AM Yafang Shao <[email protected]> wrote: > > > > > > > > On Sat, Apr 4, 2026 at 12:07 AM Song Liu <[email protected]> wrote: > > > > > > > > > > Hi Yafang, > > > > > > > > > > On Thu, Apr 2, 2026 at 2:26 AM Yafang Shao <[email protected]> > > > > > wrote: > > > > > > > > > > > > Livepatching allows for rapid experimentation with new kernel > > > > > > features > > > > > > without interrupting production workloads. However, static > > > > > > livepatches lack > > > > > > the flexibility required to tune features based on task-specific > > > > > > attributes, > > > > > > such as cgroup membership, which is critical in multi-tenant k8s > > > > > > environments. Furthermore, hardcoding logic into a livepatch > > > > > > prevents > > > > > > dynamic adjustments based on the runtime environment. > > > > > > > > > > > > To address this, we propose a hybrid approach using BPF. Our > > > > > > production use > > > > > > case involves: > > > > > > > > > > > > 1. Deploying a Livepatch function to serve as a stable BPF hook. > > > > > > > > > > > > 2. Utilizing bpf_override_return() to dynamically modify the return > > > > > > value > > > > > > of that hook based on the current task's context. > > > > > > > > > > Could you please provide a specific use case that can benefit from > > > > > this? > > > > > AFAICT, livepatch is more flexible but risky (may cause crash); while > > > > > BPF is safe, but less flexible. The combination you are proposing > > > > > seems > > > > > to get the worse of the two sides. Maybe it can indeed get the > > > > > benefit of > > > > > both sides in some cases, but I cannot think of such examples. > > > > > > > > > > > > > Here is an example we recently deployed on our production servers: > > > > > > > > > > > > https://lore.kernel.org/bpf/caloahbdnnba_w_nwh3-s9gaxw0+vkulth1gy5hy9yqgeo4c...@mail.gmail.com/ > > > > > > > > In one of our specific clusters, we needed to send BGP traffic out > > > > through specific NICs based on the destination IP. To achieve this > > > > without interrupting service, we live-patched > > > > bond_xmit_3ad_xor_slave_get(), added a new hook called > > > > bond_get_slave_hook(), and then ran a BPF program attached to that > > > > hook to select the outgoing NIC from the SKB. This allowed us to > > > > rapidly deploy the feature with zero downtime. > > > > > > I guess the idea here is: keep the risk part simple, and implement > > > it in module/livepatch, then use BPF for the flexible and programmable > > > part safe. > > > > Right > > > > > > > > Can we use struct_ops instead of bpf_override_return for this case? > > > This should make the solution more flexible. > > > > Upstreaming struct_ops based BPF hooks is a challenging process, as > > seen in these examples: > > > > https://lwn.net/Articles/1054030/ > > https://lwn.net/Articles/1043548/ > > > > Even when successful, upstreaming can take a significant amount of > > time—often longer than our production requirements allow. To bridge > > this gap, we developed this livepatch+BPF solution. This allows us to > > rapidly deploy new features without maintaining custom hooks in our > > local kernel. Because these livepatch-based hooks are lightweight, > > they minimize maintenance overhead and simplify kernel upgrades (e.g., > > from 6.1 to 6.18). > > I didn't mean upstream struct_ops. > > We can define the struct_ops in an OOT kernel module. Then we > can attach BPF programs to the struct_ops. We may need > livepatch to connect the new struct_ops to original kernel logic. > > I think kernel side of this solution is mostly available, but we may > need some work on the toolchain side. > > Does this make sense?
Are there actual benefits to using struct_ops instead of bpf_override_return? So far, I’ve only found it adds complexity without much gain. Can we add something like ALLOW_LIVEPATCH_ERROR_INJECTION() to allow error injection on functions defined inside a livepatch? -- Regards Yafang
