For notifications with NETLINK_LISTEN_ALL_NSID the expected behavior
is the following:

- if NSID is not reported, then the event is local to the listener.
- if NSID is reported, then the event is remote, i.e., originated in
  the provided namespace that is not the same as the listener's.

Userspace applications like ovs-vswitchd expect this behavior.  And
ip monitor uses this logic for printing out [nsid current] vs [nsid N].

However, when a self-referential NSID is allocated for a namespace,
every local notification starts sending this ID to userspace as part
of NETLINK_LISTEN_ALL_NSID CMSG metadata.

This is problematic, because the listener cannot tell if those
notifications are local or not anymore without making extra requests
to figure out if the provided NSID is local or not.  The listener
can also not figure out the local NSID beforehand as it can be
allocated at any point in time by other processes.

The value is practically not useful, since it's the namespace's own
ID that the application has to obtain from other sources in order to
figure out if it's the same or not.  So, for the application it's
just an extra busy work with no benefits.  Moreover, applications
that do not know about this quirk may be mishandling notifications
with NSID set as notifications from remote namespaces while they
are actually local.  This is the case with ovs-vswitchd.

Having a self-referential NSID mapping is not something that happens
under normal circumstances, but it can be a case in specific
environments.  And it can be more common with certain container
runtimes like LXC/LXD/Incus that unintentionally trigger allocation
of the self-referential NSID via cross-namespace RTM_GETLINK requests.

A search though open-source projects doesn't reveal any projects
that use NETNSA_NSID_NOT_ASSIGNED and rely on metadata to contain
self-referential NSIDs.  Quite the opposite, ovs-vswitchd relies
on the metadata to not be present to separate local and remote
events.  And the 'ip monitor' relies on the metadata to not be present
to show '[nsid current]', though this is more like "print 'current'
if there is nothing to print" situation, but still can be a little
confusing for the user to see an ID for a local event.

Fixes: 59324cf35aba ("netlink: allow to listen "all" netns")
Reported-by: Matteo Perin <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
---
 net/netlink/af_netlink.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 2aeb0680807d6..607ab4e4ac697 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1482,9 +1482,11 @@ static void do_one_broadcast(struct sock *sk,
                p->skb2 = NULL;
                goto out;
        }
-       NETLINK_CB(p->skb2).nsid = peernet2id(sock_net(sk), p->net);
-       if (NETLINK_CB(p->skb2).nsid != NETNSA_NSID_NOT_ASSIGNED)
-               NETLINK_CB(p->skb2).nsid_is_set = true;
+       if (!net_eq(sock_net(sk), p->net)) {
+               NETLINK_CB(p->skb2).nsid = peernet2id(sock_net(sk), p->net);
+               if (NETLINK_CB(p->skb2).nsid != NETNSA_NSID_NOT_ASSIGNED)
+                       NETLINK_CB(p->skb2).nsid_is_set = true;
+       }
        val = netlink_broadcast_deliver(sk, p->skb2);
        if (val < 0) {
                netlink_overrun(sk);
-- 
2.53.0


Reply via email to