Hi,
On Wed, Jan 21, 2026 at 02:11:41PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman <[email protected]>
Add netns logic to vsock core. Additionally, modify transport hook
prototypes to be used by later transport-specific patches (e.g.,
*_seqpacket_allow()).
Namespaces are supported primarily by changing socket lookup functions
(e.g., vsock_find_connected_socket()) to take into account the socket
namespace and the namespace mode before considering a candidate socket a
"match".
This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode to
report the mode and /proc/sys/net/vsock/child_ns_mode to set the mode
for new namespaces.
talking about this new feature with Daan (in CC) we were discussing a
possible change to `child_ns_mode`.
Currently, if two or more administrator processes in the same namespace
set `child_ns_mode`, they compete. Obviously, after unshare()/clone(),
the process can always access `ns_mode` to check if everything went well
and eventually retry.
Daan suggested a more conservative approach, allowing `child_ns_mode` to
be written only once (a bit like we did in the old version when the
child could change the mode only once). This way, most users who want
isolation write `local` in `child_ns_mode` at startup in the init_ns. At
that point the user and can be sure that no other process (including
administrators, e.g., container managers) can change it, so all new
namespaces will have `local` mode.
I think we should support this option in some way, because it seems to
simplify the user space in most common cases (ensure isolation). I see
few options for doing this:
1. Change the behavior of `child_ns_mode` to be written only once, but
this would limit other possible use cases where `child_ns_mode` can be
changed more than once (I don't know if Bobby had any in mind).
2. Add a new sysctl `child_ns_mode_lockin` (or something similar), which
can only be written once with a mode (local or global). A write on this
will also locks `child_ns_mode`, of course.
3. Add a new `local-locked` mode, reusing the same sysctl.
If we go for 1, maybe we can do it in 7.0, or not?
2 and 3, on the other hand, may have to wait until the next release.
What do you think? Any comments?
Thanks,
Stefano