On Wed, Feb 18, 2026 at 11:43:42AM +0100, Stefano Garzarella wrote:
> On Wed, Feb 18, 2026 at 11:02:02AM +0100, Stefano Garzarella wrote:
> > On Tue, Feb 17, 2026 at 05:45:10PM -0800, Bobby Eshleman wrote:
> > > From: Bobby Eshleman <[email protected]>
> > > 
> > > To improve the security posture of vsock namespacing, this patch locks
> > > down the vsock child_ns_mode sysctl setting with a write-once policy.
> > > The user may write to child_ns_mode only once in each namespace, making
> > > changes to either local or global mode be irreversible.
> > > 
> > > This avoids security breaches where a process in a local namespace may
> > > attempt to jailbreak into the global vsock ns space by setting
> > > child_ns_mode to "global", creating a new namespace, and accessing the
> > > global space through the new namespace.
> > 
> > Commit 6a997f38bdf8 ("vsock: prevent child netns mode switch from local
> > to global") should avoid exactly that, so I don't get this. Can you
> > elaborate more how this can happen without this patch?
> > 
> > I think here we should talk more about what we described in
> > https://lore.kernel.org/netdev/aZNNBc390y6V09qO@sgarzare-redhat/ which
> > is that two administrator processes could compete in setting
> > `child_ns_mode` and end up creating a namespace with an `ns_mode`
> > different from the one set in `child_ns_mode`. But I would also explain
> > that this can still be detected by the process by looking at `ns_mode`
> > and trying again.  With this patch, we avoid this and allow the
> > namespace manager to set it once and be sure that it cannot be changed
> > again.

Oh that's right, I lost track of the original intent when writing this.

> > 
> > > 
> > > Additionally, fix the test functions that this change would otherwise
> > > break by adding "global-parent" and "local-parent" namespaces and using
> > > them as intermediaries to spawn namespaces in the given modes. This
> > > avoids the need to change "child_ns_mode" in the init_ns. nsenter must
> > > be used because ip netns unshares the mount namespace so nested "ip
> > > netns add" breaks exec calls from the init ns.
> > 
> > I'm not sure what the policy is in netdev, but I would prefer to have
> > selftest changes in another patch (I think earlier in the series so as
> > not to break the bisection), in order to simplify backporting (e.g. in
> > CentOS Stream, to keep the backport small, I didn't backport the dozens
> > of patches for selftest that we did previously).

Sounds good! I wasn't sure if breakage so tightly coupled should be in
the same patch or not, I'm happy to split it up to ease backporting.

> > 
> > Obviously, if it's not possible and breaks the bisection, I can safely
> > skip these changes during the backport.
> > 
> > > 
> > > Test run:
> > > 
> > > 1..25
> > > ok 1 vm_server_host_client
> > > ok 2 vm_client_host_server
> > > ok 3 vm_loopback
> > > ok 4 ns_host_vsock_ns_mode_ok
> > > ok 5 ns_host_vsock_child_ns_mode_ok
> > > ok 6 ns_global_same_cid_fails
> > > ok 7 ns_local_same_cid_ok
> > > ok 8 ns_global_local_same_cid_ok
> > > ok 9 ns_local_global_same_cid_ok
> > > ok 10 ns_diff_global_host_connect_to_global_vm_ok
> > > ok 11 ns_diff_global_host_connect_to_local_vm_fails
> > > ok 12 ns_diff_global_vm_connect_to_global_host_ok
> > > ok 13 ns_diff_global_vm_connect_to_local_host_fails
> > > ok 14 ns_diff_local_host_connect_to_local_vm_fails
> > > ok 15 ns_diff_local_vm_connect_to_local_host_fails
> > > ok 16 ns_diff_global_to_local_loopback_local_fails
> > > ok 17 ns_diff_local_to_global_loopback_fails
> > > ok 18 ns_diff_local_to_local_loopback_fails
> > > ok 19 ns_diff_global_to_global_loopback_ok
> > > ok 20 ns_same_local_loopback_ok
> > > ok 21 ns_same_local_host_connect_to_local_vm_ok
> > > ok 22 ns_same_local_vm_connect_to_local_host_ok
> > > ok 23 ns_delete_vm_ok
> > > ok 24 ns_delete_host_ok
> > > ok 25 ns_delete_both_ok
> > > SUMMARY: PASS=25 SKIP=0 FAIL=0
> > 
> > IMO this can be removed from the commit message, doesn't add much value
> > other than say that all test passes.

sgtm!

> > 
> > > 
> > > Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
> > > Signed-off-by: Bobby Eshleman <[email protected]>
> > > Suggested-by: Daan De Meyer <[email protected]>
> > > Suggested-by: Stefano Garzarella <[email protected]>
> > > ---
> > > include/net/af_vsock.h                  |  6 +++++-
> > > include/net/netns/vsock.h               |  1 +
> > > net/vmw_vsock/af_vsock.c                | 10 ++++++----
> > > tools/testing/selftests/vsock/vmtest.sh | 35
> > > +++++++++++++++------------------
> > > 4 files changed, 28 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > index d3ff48a2fbe0..c7de33039907 100644
> > > --- a/include/net/af_vsock.h
> > > +++ b/include/net/af_vsock.h
> > > @@ -276,10 +276,14 @@ static inline bool vsock_net_mode_global(struct 
> > > vsock_sock *vsk)
> > >   return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
> > > }
> > > 
> > > -static inline void vsock_net_set_child_mode(struct net *net,
> > > +static inline bool vsock_net_set_child_mode(struct net *net,
> > >                                       enum vsock_net_mode mode)
> > > {
> > > + if (xchg(&net->vsock.child_ns_mode_locked, 1))
> > > +         return false;
> > > +
> > >   WRITE_ONCE(net->vsock.child_ns_mode, mode);
> > > + return true;
> > > }
> > > 
> > > static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
> > > diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
> > > index b34d69a22fa8..8c855fff8039 100644
> > > --- a/include/net/netns/vsock.h
> > > +++ b/include/net/netns/vsock.h
> > > @@ -17,5 +17,6 @@ struct netns_vsock {
> > > 
> > >   enum vsock_net_mode mode;
> > >   enum vsock_net_mode child_ns_mode;
> > > + int child_ns_mode_locked;
> > > };
> > > #endif /* __NET_NET_NAMESPACE_VSOCK_H */
> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > index 9880756d9eff..35e097f4fde8 100644
> > > --- a/net/vmw_vsock/af_vsock.c
> > > +++ b/net/vmw_vsock/af_vsock.c
> > > @@ -90,14 +90,15 @@
> > > *
> > > *   - /proc/sys/net/vsock/ns_mode (read-only) reports the current 
> > > namespace's
> > > *     mode, which is set at namespace creation and immutable thereafter.
> > > - *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode 
> > > future
> > > + *   - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode 
> > > future
> > > *     child namespaces will inherit when created. The initial value 
> > > matches
> > > *     the namespace's own ns_mode.
> > > *
> > > *   Changing child_ns_mode only affects newly created namespaces, not the
> > > *   current namespace or existing children. A "local" namespace cannot set
> > > - *   child_ns_mode to "global". At namespace creation, ns_mode is 
> > > inherited
> > > - *   from the parent's child_ns_mode.
> > > + *   child_ns_mode to "global". child_ns_mode is write-once, so that it 
> > > may
> > > + *   be configured and locked down by a namespace manager. At namespace
> > > + *   creation, ns_mode is inherited from the parent's child_ns_mode.
> > 
> > We just merged commit a07c33c6f2fc ("vsock: document namespace mode
> > sysctls") in the net tree, so we should update also
> > Documentation/admin-guide/sysctl/net.rst

Indeed.

> > 
> > > *
> > > *   The init_net mode is "global" and cannot be modified.
> > 
> > Maybe we should also emphasise in the documentation and in the commit
> > description that `child_ns_mode` in `init_net` also is write-once, so
> > writing `local` to that one by the init process (e.g. systemd),
> > essentially will make all the new namespaces in `local` mode. This could
> > be useful for container/namespace managers.
> > 

Sounds good.

> > > *
> > > @@ -2853,7 +2854,8 @@ static int vsock_net_child_mode_string(const struct 
> > > ctl_table *table, int write,
> > >               new_mode == VSOCK_NET_MODE_GLOBAL)
> > >                   return -EPERM;
> > > 
> > > -         vsock_net_set_child_mode(net, new_mode);
> > > +         if (!vsock_net_set_child_mode(net, new_mode))
> > > +                 return -EPERM;
> > 
> > So, if `child_ns_mode` is set to `local` but locked, writing `local`
> > again will return -EPERM, is this really what we want?
> > 
> > I'm not sure if we can relax it a bit, but then we may race between
> > reader and writer, so maybe it's fine like it is in this patch, but we
> > should document better that any writes (even same value) after the first
> > one will return -EPERM.
> 
> I think we can try in this way:
> 
> static inline bool vsock_net_set_child_mode(struct net *net,
>                                           enum vsock_net_mode mode)
> {
>       int new_locked = mode + 1;
>       int old_locked = 0;
> 
>       if (try_cmpxchg(&net->vsock.child_ns_mode_locked,
>                       &old_locked, new_locked)) {
>               WRITE_ONCE(net->vsock.child_ns_mode, mode);
>               return true;
>       }
> 
>       return old_locked == new_locked;
> }
> 
> With a comment like this near child_ns_mode_locked in struct netns_vsock:
> /* 0 = unlocked
>  * 1 = locked to GLOBAL (VSOCK_NET_MODE_GLOBAL + 1)
>  * 2 = locked to LOCAL  (VSOCK_NET_MODE_LOCAL + 1)
>  */

This looks good to me. Will change.

> 
> While writing that, I thought that we can even remove `child_ns_mode_locked`
> and use a single variable where encode the value and the state, but maybe
> it's an unnecessary extra complexity.
> 
> Stefano
> 
> > 
> > About that, should we return something different, like -EBUSY ?
> > Not a strong opinion, just to differentiate with the other check before.
> > 
> > Thanks,
> > Stefano
> 

Differentiating with -EBUSY sounds useful. Will add that and document.

Best,
Bobby

Reply via email to