Hannes Frederic Sowa <[email protected]> writes: > On 18.05.2016 01:12, Eric W. Biederman wrote: >> >> While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the >> bpf filesystem. Looking at the code I saw a broken usage of mount_ns >> with current->nsproxy->mnt_ns. As the code does not acquire a reference >> to the mount namespace it can not possibly be correct to store the mount >> namespace on the superblock as it does. >> >> Replace mount_ns with mount_nodev so that each mount of the bpf >> filesystem returns a distinct instance, and the code is not utterly >> broken. >> >> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs") >> Signed-off-by: "Eric W. Biederman" <[email protected]> >> --- >> >> No one should care about this change, as userspace typically only mounts >> things once and does not depend on things in one mount do not showing up >> in another. Can someone who actually uses the bpf filesystem please >> verify this. >> >> This needs to be fixed as the existing code is broken beyond words that >> I know how to express. > > The idea is to have the bpf filesystem as a singeleton per mnt-namespace > to prevent endless instances being created and kernel resources being > hogged by pinning them to hard to discover bpf mounts.
There is no method in the kernel to support a singleton per mount namespace. Mount propagation ruins that idea, and in most recent distros mount propgation is enabled by default (it is something you can opt out of later but not opt into later). In general convention is a much better defense against endless instances. Having just fought a similar fight with devpts (because things went horribly wrong) you are much better off with telling people to be careful how to use things rather than not letting people use things wrong. Especially if we are still at the "the idea is" stage rather than a stage where changing this will actually break deployed implementations. > Do you see any problem with adding appropriate reference counts? Honestly my head hurts thinking about it. Technically reference counts would fix one aspect of it, but the whole situation really sucks. Especially in a world of mount propgation where these mounts propgate between mount namespaces, and where people choose to share or not on a different criteria besides the mount namespace, attempting a one fs per mount namespace policy is just bizarre bordering on completely broken. Even if implemented correctly. Filesystems do not know and should not care about the mount namespace they are implemented it. These are and should remain independent concenpts and your implementation and attempted semantics violate that horribly and I can't see a way to achieve what you were trying to achieve. The VFS just doesn't work that way. Eric
