On 07.08.2018 19:24, Josh Poimboeuf wrote:
On Tue, Aug 07, 2018 at 06:13:29PM +0300, Evgenii Shatokhin wrote:
Hi,
If there are any ideas how to safely apply fixes like
http://patchwork.ozlabs.org/patch/952304/ ("netfilter: ipset: fix
ip_set_list allocation failure") via a live patch, it would be great.
Essentially, the patch replaces the calls to kmalloc and kfree with kvmalloc
and kvfree, respectively:
----------------------
@@ -2059,7 +2059,7 @@ ip_set_net_init(struct net *net)
if (inst->ip_set_max >= IPSET_INVALID_ID)
inst->ip_set_max = IPSET_INVALID_ID - 1;
- list = kcalloc(inst->ip_set_max, sizeof(struct ip_set *), GFP_KERNEL);
+ list = kvcalloc(inst->ip_set_max, sizeof(struct ip_set *), GFP_KERNEL);
if (!list)
return -ENOMEM;
inst->is_deleted = false;
@@ -2087,7 +2087,7 @@ ip_set_net_exit(struct net *net)
}
}
nfnl_unlock(NFNL_SUBSYS_IPSET);
- kfree(rcu_dereference_protected(inst->ip_set_list, 1));
+ kvfree(rcu_dereference_protected(inst->ip_set_list, 1));
}
static struct pernet_operations ip_set_net_ops = {
----------------------
However, this patch is unsafe to apply as is in runtime. Consider the
following scenario:
1. The live patch is loaded.
2. The user creates a new net ns or whatever, so that the *patched*
ip_set_net_init() is called and calls kvcalloc(). Let us assume kvcalloc()
uses vmalloc() this time.
3. The live patch is unloaded for some reason.
4. The user destroys the net ns and the *original* ip_set_net_exit() is
called. The address of the memory area allocated with vmalloc at step 2 will
be passed to kfree() here. The kernel will crash.
Currently, I cannot see how to provide such fixes in a live patch without
additional hacks.
As for the hacks, I could probably add a pre-patch callback that finds the
call to kfree() in the original ip_set_net_exit() and permanently replaces
it with a call to kvfree(). This way, even after the live patch has been
unloaded at step 3, the memory will be freed correctly at the step 4. Not
very elegant, but might work.
As I am only interested in doing this on x86_64 right now, finding the
needed call in the binary code of ip_set_net_exit() is not a problem. As for
replacement itself - well, I guess something like text_poke* or a part of
its implementation could be used here.
Still, I wonder if there is a better way to apply such fixes in runtime,
ideally, without additional hacks.
Any ideas?
I'm thinking the only non-hackish way to prevent the above scenario is
to prevent the module from being unloaded. Maybe with a
try_module_get(THIS_MODULE) from the pre-patch hook?
Thanks for a quick reply!
Yes, it is possible to make the patch module unloadable but I guess, it
might make it more difficult to update the patch in the future, in our
particular case.
Splitting this fix into a separate binary patch and making it
permanently loaded might be feasible too. It is essentially similar to a
permanent replacement of the calls in the binary code I described
before, but probably easier to implement/maintain.
However, I think there are also other races:
1) Before the patch, a netns is created, and the unpatched
ip_set_net_init() is called, which uses kcalloc().
After the patch, the netns is destroyed, and the patched
ip_set_net_exit() is called, which uses kvfree(). Boom.
I think, this case is OK: kvfree() checks how memory was allocated. If
kmalloc was used, via kvmalloc or directly, kvfree() will call kfree(),
which is what is needed.
2) During the patch, patched thread A calls ip_set_net_init(), then
unpatched thread B calls ip_set_net_exit(). Boom. This is unlikely,
but possible if net namespaces are being created/destroyed during the
patching process.
Yes, and it might also be, for example, that livepatch has converted
thread A to the patched functions, but is now waiting for a safe moment
to do that to thread B.
Replacing kfree() calls for all threads first, in one go (the old
Kpatch's way) would help, it seems.
One solution for #1 involves using a shadow variable. In the init
function, create a dummy shadow variable in ip_set_net_init() which
indicates that kvcalloc() was used. Then in the exit function, if the
shadow variable exists, use kvfree(), else use kfree().
I suppose, kvfree() already handles this.
The solution for #2 is a little trickier. One option might be to grab
the appropriate lock (net->nsid_lock?) in the pre-patch callback and
release it in the post-patch callback.
Neat. Haven't thought about that.
Another option might be to split it into two patches, with the exit
patch loaded first and the init patch loaded second.
The "exit" patch should remain permanently loaded, of course, for that
to work. Or, at least, until we can guarantee no more netns'es are going
to be destroyed.
Such multi-stage patching looks interesting. Need to think about it,
esp., about how it would do w.r.t. patch updates and downgrades.
We may want to figure out a way to support this scenario a little better
in upstream livepatch. For example, maybe we could have an interface to
split a patch into subpatches which are applied in a certain order.
More ideas welcome...
BTW, if you get it all figured out, an update to the
patch-author-guide.md would be appreciated :-)
:-)
Regards,
Evgenii
_______________________________________________
kpatch mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/kpatch