On Thu, Jun 11, 2026 at 8:58 PM Petr Mladek <[email protected]> wrote:
>
> On Tue 2026-06-09 18:00:55, Petr Mladek wrote:
> > On Sun 2026-06-07 21:16:55, Yafang Shao wrote:
> > I would write something like:
> >
> > <proposal>
> > The practice shows that the current semantic of the patch.replace flag is
> > not ideal.
> >
> > The atomic replace is disabled by default. And the no-replace mode allows
> > wild installation of many livepatches in parallel. The author and
> > administrator are fully responsible for preventing problems caused
> > by producing and installing incompatible livepatches.
> >
> > The most safe atomic replace mode must be explicitly enabled by
> > setting "patch.replace = true". It is all or nothing. The livepatch
> > with enabled .replace will always replace all already installed
> > livepatches. It makes it very safe but it might be too harsh.
> >
> > Improve the situation by switching "bool .replace" flag to
> > "u32 .replace_set" and and updating its semantic.
> >
> > Any .replace_set value might be associated with a set of livepatched
> > symbols, callbacks, shadow variable and state IDs.
> >
> > A livepatch with a particular .replace_set number will atomically
> > rreplace any already installed livepatch with the same .replace_set
> > number. By definition, there can only ever be one active livepatch
> > for a given replace_set number.
> >
> > On the contrary, livepatches with a different .replace_set number
> > must not modify the same function, or use the state with the same
> > ID [*]. Any attempt to load an incompatible livepatch will be
> > rejected.
> >
> > Summary:
> >
> > The most safe mode when any livepatch replaces any other livepatch
> > will be the default. Note that all livepatches must keep
> > .replace_set = 0.
> >
> > It will be possible to install more livepatches in parallel by
> > using different .replace_set numbers. The livepatches might be
> > updated independently using the atomic replace feature as long
> > as the new version does not break compatibility. The kernel will
> > reject a livepatch from a different replace set when it would
> > want to modify the same function or livepatch state from
> > another replace set.
> >
> > [*] The compatibility check of callbacks and shadow variables will
> >     be improved later by reworking their semantic. There is a work
> >     in progress, see [0]
> > </proposal>
> >
> > > Link: https://github.com/pmladek/linux/tree/klp-state-transfer-v1-iter12 
> > > [0]
> >
> > I have realized that I actually sent "v1-iter12" to the public
> > mailing list as the official v1. So we could use:
> >
> > Link: https://lore.kernel.org/all/[email protected]/ 
> > [0]
> >
> >
> > New idea:
> >
> > I have briefly discussed the new semantic with Miroslav when I met
> > him in person. And he was a bit concerned. We as an OS distributor
> > might want to be sure that our livepatches can be installed the most
> > safe way. So, we still might want to preserve the "replace all"
> > semantic to make sure that our livepatches will not break anything.
>
> I thought more about it and we would need some solution to preserve
> the replace_all functionality.
>
> There were recently reported few serious 0-day vulnerabilities.
> We discussed a possibility to ship a quick fix with a livepatch.
> Or that customers might want to fix it themself by a livepatch.
> Such a livepatch would need to be installed in parallel to
> the official livepatch fixing older bugs. But the next official
> cumulative livepatch would need to replace it.
>
> The above scenario will not longer work with the current
> "replace_set" handling. The hotfix would need to use another
> "replace_set" so that it can be installed in parallel.
> But the next cumulative livepatch won't be able to replace
> it because it would need to modify the same function.
>
> I consulted this with AI (claude-sonet-4.6) and it gave the following
> feedback/ideas ;-)
>
> > I though about 4 approaches approaches:
> >
> > 1. Make .replace_set=0 special so that it will always replace
> >    everything. Similar to the current .replace=true mode.
> >
> >    Customers will still be able to install custom livepatches
> >    later with .replace_set != 0. But the "0" livepatch will
> >    always wipe them out.
>
> This is not ideal because it is asymetric. Why is "0" special?
>
>
> > 2. Use two flags in the livepatch, for example
> >
> >      a. Rename .replace to .replace_all. The livepatch with this
> >       flag set will always wipe all other livepatches.
> >
> >      b. Add .replace_set which will allow to install more livepatches
> >       in parallel, replace the livepatches with the same .replace_set
> >       atomically, and check the compatibility. As described above.
> >
> >     It is a bit more complicated. But it is more compatible with
> >     the current state. And it removes the special meaning of
> >     .replace_set == 0.
>
> This looks more straightforward. But the fact that "replace_all"
> replaces everything brings back the problem with the original
> "replace" flag. So, it makes this whole exercise more or less
> pointless.
>
> I had another idea with storing list of fixed bugs/CVEs in each
> livepatch. Independent fixes might be fixed by independent
> livepatches. Then a cumulative livepatch would replace only
> the livepatches which fixed the same bugs before.
>
> And (claude-sonnet-4.6) came with an interesting simplification.
>
> We could add:
>
> struct klp_patch {
> [...]
>       unsigned int replace_set;
>       const unsigned int *supersedes;   /* Zero terminated array of 
> replace_set IDs */
> [...]
> }
>
> So that the cumulative livepatch might optionally define
> another "replace_set"s which would be replaced.
>
> This would work well when both cumulative livepatches and the hotfix
> are provided by the same vendor or group.
>
> We could also allow to change it dynamically by adding an module
> option to the cumulative livepatch, .e.g supersedes=id[,id]*
> We could add some support into the kernel for handling the module
> parameter a standard way.

I prefer this option because it allows us to dynamically set the
supersedes at runtime, avoiding hardcoded values at build time. This
flexibility is essential for handling complex production environments.

>
> It is not trivial. But it is also not horribly complex.
> It looks like a good compromise between the requirements and
> code complexity.
>
> We really need input from others here.

Happy to hear what others think as well.

-- 
Regards
Yafang

Reply via email to