Hello Michal,
On Mon, May 11, 2026 at 03:07:51PM +0200, Michal Hocko wrote:
> > I work with these issues at Meta, and this approach would address a real
> > need we have.
> >
> > While livepatch could theoretically solve this problem, it's less suited
> > for rapid mitigation for a couple of reasons:
> >
> > 1) Livepatch rollout is inherently slower due to the blast radius if a
> > bug exists in the livepatch mechanism itself.
> >
> > 2) It's common to run hundreds of different kernel versions across a
> > fleet. Since livepatch is kernel-specific, a single CVE suddenly
> > requires building and deploying hundreds of individual livepatches—
> > far less practical than a simple sysfs write.
>
> LP is certainly a more laborous solution. I guess this is quite clear.
>
> It is also much safer option as it deals with all implementation details
> like consistency. All that is not done for fun. I am really wondering
> how admins are expected to a) know which kernel functions are ok/safe to
> disable and b) when it is safe to do so without introducing unsafe
> kernel state or introduce an outright bug that way.
You raise a valid concern. There's no simple answer. Making these decisions
requires deep understanding of both the code and the potential consequences.
The value proposition here (IMO) is the ability to completely disable a
code path by returning an error code (such as -EINVAL or -EBUSY) at key
entry points, rather than attempting surgical modifications.
While this approach is far from perfect, it can serve as an effective
stopgap measure until a proper fix is deployed or a livepatch becomes
available.
> Thiking about this I can see how waiting for an official LP can be time
> consuming and sometimes creating those is far from trivial. But would it
> make sense to have automated LP creation tooling available that would
> allow to return early from a function and relly on the existing
> infrastructure to do the right thing?
Absolutely. I view this as a progression of mitigation strategies, where
the ultimate goal is deploying a properly fixed kernel, but reaching that
endpoint may require intermediate steps.
1) Fix and deploy a new kernel:
* Pros: Lowest risk, permanent solution
* Cons:
- Requires reboot and extended downtime
2) Livepatch:
* Pros: Complete mitigation, clean approach, zero downtime
* Cons:
- Time-intensive rollout (requires bake time and health checks)
- Demands manual patch creation and review for each kernel version
(i.e., kernel developer involvement is essential)
3) This approach (killswitch):
* Pros: Immediate deployment capability
- Security engineers familiar with kernel code can act independently
* Cons:
- Risk of instability if the operator misjudges the impact
In short, I see killswitch as a complementary tool in the security
toolbox, not a universal solution.
--breno