On Fri, Oct 14, 2022 at 5:18 AM Andy Lutomirski <[email protected]> wrote: > On Thu, Oct 6, 2022, at 7:13 AM, Jann Horn wrote: > > On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner <[email protected]> > > wrote: > >> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote: > >> > The check_unsafe_exec() counting of n_fs would not add up under a heavily > >> > threaded process trying to perform a suid exec, causing the suid portion > >> > to fail. This counting error appears to be unneeded, but to catch any > >> > possible conditions, explicitly unshare fs_struct on exec, if it ends up > >> > >> Isn't this a potential uapi break? Afaict, before this change a call to > >> clone{3}(CLONE_FS) followed by an exec in the child would have the > >> parent and child share fs information. So if the child e.g., changes the > >> working directory post exec it would also affect the parent. But after > >> this change here this would no longer be true. So a child changing a > >> workding directoro would not affect the parent anymore. IOW, an exec is > >> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but > >> it seems like a non-trivial uapi change but there might be few users > >> that do clone{3}(CLONE_FS) followed by an exec. > > > > I believe the following code in Chromium explicitly relies on this > > behavior, but I'm not sure whether this code is in active use anymore: > > > > https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromium > > Wait, this is absolutely nucking futs. On a very quick inspection, the > sharable things like this are fs, files, sighand, and io. files and > sighand get unshared, which makes sense. fs supposedly checks for extra refs > and prevents gaining privilege. io is... ignored! At least it's not > immediately obvious that io is a problem. > > But seriously, this makes no sense at all. It should not be possible to exec > a program and then, without ptrace, change its cwd out from under it. Do we > really need to preserve this behavior?
I agree that this is pretty wild. The single user I'm aware of is Chrome, and as far as I know, they use it for establishing their sandbox on systems where unprivileged user namespaces are disabled - see <https://chromium.googlesource.com/chromium/src/+/main/docs/linux/suid_sandbox.md>. They also have seccomp-based sandboxing, but IIRC there are some small holes that mean it's still useful for them to be able to set up namespaces, like how sendmsg() on a unix domain socket can specify a file path as the destination address. (By the way, I think maybe Chrome wouldn't need this wacky trick with the shared fs_struct if the "NO_NEW_PRIVS permits chroot()" thing had ever landed that you (https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.l...@amacapital.net/) and Mickaël Salaün proposed in the past... or alternatively, if there was a way to properly filter all the syscalls that Chrome has to permit for renderers.) (But also, to be clear, I don't speak for Chrome, this is just my understanding of how their stuff works.)
