On Tue, Sep 02, 2025 at 05:39:33PM +0200, Stefano Garzarella wrote: > On Wed, Aug 27, 2025 at 05:31:32PM -0700, Bobby Eshleman wrote: > > From: Bobby Eshleman <bobbyeshle...@meta.com> > > > > Add NS support to vsock loopback. Sockets in a global mode netns > > communicate with each other, regardless of namespace. Sockets in a local > > mode netns may only communicate with other sockets within the same > > namespace. > > > > Add callbacks for transport to hook into the initialization and exit of > > net namespaces. > > > > The transport's init hook will be called once per netns init. Likewise > > for exit. > > > > When a set of init/exit callbacks is registered, the init callback is > > called on each already existing namespace. > > > > Only one callback registration is supported for now. Currently > > vsock_loopback is the only user. > > Why? > > In general, commit descriptions (and code comments) should focus on the > reason (why?) to simplify also the review. >
Sounds good, will improve the message/comments. I'm realizing as I type this there may be a way to avoid the callbacks altogether with pernet_operations, so I'll clarify that before next rev. > > > > Signed-off-by: Bobby Eshleman <bobbyeshle...@meta.com> > > > > --- > > Changes in v5: > > - add callbacks code to avoid reverse dependency > > - add logic for handling vsock_loopback setup for already existing > > namespaces > > --- > > include/net/af_vsock.h | 34 +++++++++++++ > > include/net/netns/vsock.h | 5 ++ > > net/vmw_vsock/af_vsock.c | 110 > > +++++++++++++++++++++++++++++++++++++++++ > > net/vmw_vsock/vsock_loopback.c | 72 ++++++++++++++++++++++++--- > > 4 files changed, 213 insertions(+), 8 deletions(-) > > > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h > > index 83f873174ba3..9333a98b9a1e 100644 > > --- a/include/net/af_vsock.h > > +++ b/include/net/af_vsock.h > > @@ -305,4 +305,38 @@ static inline bool vsock_net_check_mode(struct net > > *n1, struct net *n2) > > (vsock_net_mode(n1) == VSOCK_NET_MODE_GLOBAL && > > vsock_net_mode(n2) == VSOCK_NET_MODE_GLOBAL); > > } > > + > > +struct vsock_net_callbacks { > > + int (*init)(struct net *net); > > + void (*exit)(struct net *net); > > + struct module *owner; > > +}; > > + > > +#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK) > > + > > +#define vsock_register_net_callbacks(__init, __exit) \ > > + __vsock_register_net_callbacks((__init), (__exit), THIS_MODULE) > > + > > +int __vsock_register_net_callbacks(int (*init)(struct net *net), > > + void (*exit)(struct net *net), > > + struct module *owner); > > +void vsock_unregister_net_callbacks(void); > > + > > +#else > > + > > +#define vsock_register_net_callbacks(__init, __exit) do { } while (0) > > + > > +static inline int __vsock_register_net_callbacks(int (*init)(struct net > > *net), > > + void (*exit)(struct net *net), > > + struct module *owner) > > +{ > > + return 0; > > +} > > + > > +static inline void vsock_unregister_net_callbacks(void) {} > > +static inline int vsock_net_call_init(struct net *net) { return 0; } > > +static inline void vsock_net_call_exit(struct net *net) {} > > + > > +#endif /* CONFIG_VSOCKETS_LOOPBACK */ > > + > > #endif /* __AF_VSOCK_H__ */ > > diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h > > index d4593c0b8dc4..08d9a933c540 100644 > > --- a/include/net/netns/vsock.h > > +++ b/include/net/netns/vsock.h > > @@ -9,6 +9,8 @@ enum vsock_net_mode { > > VSOCK_NET_MODE_LOCAL, > > }; > > > > +struct vsock_loopback; > > + > > struct netns_vsock { > > struct ctl_table_header *vsock_hdr; > > spinlock_t lock; > > @@ -16,5 +18,8 @@ struct netns_vsock { > > /* protected by lock */ > > enum vsock_net_mode mode; > > bool written; > > +#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK) > > + struct vsock_loopback *loopback; > > If this is not protected by `lock`, please leave an empty line, but maybe we > should consider using locking (see comment later). > Will do. > > +#endif > > }; > > #endif /* __NET_NET_NAMESPACE_VSOCK_H */ > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c > > index 68a8875c8106..5a73d9e1a96f 100644 > > --- a/net/vmw_vsock/af_vsock.c > > +++ b/net/vmw_vsock/af_vsock.c > > @@ -134,6 +134,9 @@ > > #include <uapi/linux/vm_sockets.h> > > #include <uapi/asm-generic/ioctls.h> > > > > +static struct vsock_net_callbacks vsock_net_callbacks; > > +static DEFINE_MUTEX(vsock_net_callbacks_lock); > > + > > static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr); > > static void vsock_sk_destruct(struct sock *sk); > > static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb); > > @@ -2781,6 +2784,49 @@ static void vsock_net_init(struct net *net) > > net->vsock.mode = VSOCK_NET_MODE_GLOBAL; > > } > > > > +#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK) > > +static int vsock_net_call_init(struct net *net) > > +{ > > + struct vsock_net_callbacks *cbs; > > + int ret; > > + > > + mutex_lock(&vsock_net_callbacks_lock); > > + cbs = &vsock_net_callbacks; > > + > > + ret = 0; > > + if (!cbs->owner) > > + goto out; > > + > > + if (try_module_get(cbs->owner)) { > > + ret = cbs->init(net); > > + module_put(cbs->owner); > > + } > > + > > +out: > > + mutex_unlock(&vsock_net_callbacks_lock); > > + return ret; > > +} > > + > > +static void vsock_net_call_exit(struct net *net) > > +{ > > + struct vsock_net_callbacks *cbs; > > + > > + mutex_lock(&vsock_net_callbacks_lock); > > + cbs = &vsock_net_callbacks; > > + > > + if (!cbs->owner) > > + goto out; > > + > > + if (try_module_get(cbs->owner)) { > > + cbs->exit(net); > > + module_put(cbs->owner); > > + } > > + > > +out: > > + mutex_unlock(&vsock_net_callbacks_lock); > > +} > > +#endif /* CONFIG_VSOCKETS_LOOPBACK */ > > + > > static __net_init int vsock_sysctl_init_net(struct net *net) > > { > > vsock_net_init(net); > > @@ -2788,12 +2834,20 @@ static __net_init int vsock_sysctl_init_net(struct > > net *net) > > if (vsock_sysctl_register(net)) > > return -ENOMEM; > > > > + if (vsock_net_call_init(net) < 0) > > + goto err_sysctl; > > + > > return 0; > > + > > +err_sysctl: > > + vsock_sysctl_unregister(net); > > + return -ENOMEM; > > } > > > > static __net_exit void vsock_sysctl_exit_net(struct net *net) > > { > > vsock_sysctl_unregister(net); > > + vsock_net_call_exit(net); > > } > > > > static struct pernet_operations vsock_sysctl_ops __net_initdata = { > > @@ -2938,6 +2992,62 @@ void vsock_core_unregister(const struct > > vsock_transport *t) > > } > > EXPORT_SYMBOL_GPL(vsock_core_unregister); > > > > +#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK) > > +int __vsock_register_net_callbacks(int (*init)(struct net *net), > > + void (*exit)(struct net *net), > > + struct module *owner) > > +{ > > + struct vsock_net_callbacks *cbs; > > + struct net *net; > > + int ret = 0; > > + > > + mutex_lock(&vsock_net_callbacks_lock); > > + > > + cbs = &vsock_net_callbacks; > > + cbs->init = init; > > + cbs->exit = exit; > > + cbs->owner = owner; > > + > > + /* call callbacks on any net previously created */ > > + down_read(&net_rwsem); > > + > > + if (try_module_get(cbs->owner)) { > > + for_each_net(net) { > > + ret = cbs->init(net); > > + if (ret < 0) > > + break; > > + } > > + > > + if (ret < 0) > > + for_each_net(net) > > + cbs->exit(net); > > + > > + module_put(cbs->owner); > > + } > > + > > + up_read(&net_rwsem); > > + mutex_unlock(&vsock_net_callbacks_lock); > > + > > + return ret; > > +} > > +EXPORT_SYMBOL_GPL(__vsock_register_net_callbacks); > > + > > +void vsock_unregister_net_callbacks(void) > > +{ > > + struct vsock_net_callbacks *cbs; > > + > > + mutex_lock(&vsock_net_callbacks_lock); > > + > > + cbs = &vsock_net_callbacks; > > + cbs->init = NULL; > > + cbs->exit = NULL; > > + cbs->owner = NULL; > > + > > + mutex_unlock(&vsock_net_callbacks_lock); > > +} > > +EXPORT_SYMBOL_GPL(vsock_unregister_net_callbacks); > > IIUC this function is called only in the error path of > `vsock_loopback_init()`, but shuold we call it also in the > vsock_loopback_exit() ? > Ah right, that needs to be done there too. > > +#endif /* CONFIG_VSOCKETS_LOOPBACK */ > > + > > module_init(vsock_init); > > module_exit(vsock_exit); > > > > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c > > index 1b2fab73e0d0..f16d21711cb0 100644 > > --- a/net/vmw_vsock/vsock_loopback.c > > +++ b/net/vmw_vsock/vsock_loopback.c > > @@ -28,8 +28,19 @@ static u32 vsock_loopback_get_local_cid(void) > > > > static int vsock_loopback_send_pkt(struct sk_buff *skb) > > { > > - struct vsock_loopback *vsock = &the_vsock_loopback; > > + struct vsock_loopback *vsock; > > int len = skb->len; > > + struct net *net; > > + > > + if (skb->sk) > > + net = sock_net(skb->sk); > > + else > > + net = NULL; > > Why we can't use `virtio_vsock_skb_net` here? > No reason why not. Using it should make it more uniform. > > + > > + if (net && net->vsock.mode == VSOCK_NET_MODE_LOCAL) > > + vsock = net->vsock.loopback; > > + else > > + vsock = &the_vsock_loopback; > > > > virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb); > > queue_work(vsock->workqueue, &vsock->pkt_work); > > @@ -134,27 +145,72 @@ static void vsock_loopback_work(struct work_struct > > *work) > > } > > } > > > > -static int __init vsock_loopback_init(void) > > +static int vsock_loopback_init_vsock(struct vsock_loopback *vsock) > > { > > - struct vsock_loopback *vsock = &the_vsock_loopback; > > - int ret; > > - > > vsock->workqueue = alloc_workqueue("vsock-loopback", 0, 0); > > if (!vsock->workqueue) > > return -ENOMEM; > > > > skb_queue_head_init(&vsock->pkt_queue); > > INIT_WORK(&vsock->pkt_work, vsock_loopback_work); > > + return 0; > > +} > > + > > +static void vsock_loopback_deinit_vsock(struct vsock_loopback *vsock) > > +{ > > + if (vsock->workqueue) > > + destroy_workqueue(vsock->workqueue); > > +} > > + > > +/* called with vsock_net_callbacks lock held */ > > +static int vsock_loopback_init_net(struct net *net) > > +{ > > + if (WARN_ON_ONCE(net->vsock.loopback)) > > + return 0; > > + > > Do we need some kind of locking here? I mean when reading/setting > `net->vsock.loopback`? > I could be wrong here, but I think net->vsock.loopback being set before vsock_core_register() prevents racing with net->vsock.loopback reads. We could add a lock to make sure and to make the protection explicit though. > > + net->vsock.loopback = kmalloc(sizeof(*net->vsock.loopback), > > GFP_KERNEL); > > + if (!net->vsock.loopback) > > + return -ENOMEM; > > + > > + return vsock_loopback_init_vsock(net->vsock.loopback); > > +} > > + > > +/* called with vsock_net_callbacks lock held */ > > +static void vsock_loopback_exit_net(struct net *net) > > +{ > > + if (net->vsock.loopback) { > > + vsock_loopback_deinit_vsock(net->vsock.loopback); > > + kfree(net->vsock.loopback); > > Should we set `net->vsock.loopback` to NULL here? > Yes. > > + } > > +} > > + > > +static int __init vsock_loopback_init(void) > > +{ > > + struct vsock_loopback *vsock = &the_vsock_loopback; > > + int ret; > > + > > + ret = vsock_loopback_init_vsock(vsock); > > + if (ret < 0) > > + return ret; > > + > > + ret = vsock_register_net_callbacks(vsock_loopback_init_net, > > + vsock_loopback_exit_net); > > IIUC we need this only here because for now the only other transport > supported is vhost-vsock, and IIUC `struct vhost_vsock *` there is handled > with a map instead of a static variable, and `net` assigned when > /dev/vhost-vsock is opened, right? > Correct. The vhost map lookup is modified to account for namespaces, but vsock loopback doesn't have a map to do that. The callbacks are used to hook into the netns initialization. I wonder if it is possible to do this with just pernet_operations though... when I wrote this I was pretty laser-focused on the sysctl/procfs + netns init code, and may not have realized there may be similar hooks that aren't bound to the sysctl/proc init. I'll clarify this before the next rev. > If in the future we will need to support G2H transports, like > virtio-transport, we need to do something similar, right? > My guess is that we'll be able to avoid using these callbacks unless there is some per-net data we need to initialize. I'm guessing if we follow a similar path as using ioctl to assign the dev netns, then we won't need it. It might be moot if pernet_operations work to avoid the module circular dependency. > BTW I think we really need to exaplin this better in the commit description. > It tooks me a while to get all of this (if it's correct) > Roger that, I'll improve this going forward. Best, Bobby