On Thu, Jun 25, 2026 at 10:50 AM Christian Brauner <[email protected]> wrote: > The arguments I have heard from various people so far are: > > (1) Userspace would be able to clone a random chroot to /woot and run a > binary from it without having to set up a complicated sandbox > effectively making dynamically linked binaries more like static > binaries in a sense. > > (2) Quote: > "If you debootstrap/dnf a chroot to some location in your > home dir and try to run a binary from it, that it tries to load the > libraries from your /usr is a pretty unintuitive and not at all > useful behavior." > > (3) Quote: > "[Various remote execution things run in locked down containers that > disable userns, which makes the sandbox impossible and hence our > builds wouldn't work there."
FWIW I think someone also mentioned to me that it would make things easier for them if they could build a piece of software in one environment and then bundle it up with all required libraries and such and run it in a very different environment, without container/sandboxing stuff and without static linking. But I guess that's kinda niche. > I'm discounting "Oh, userspace already allows this so why not the > kernel.". I think that's generally a bad argument. Kernel and userspace > aren't really alike in that regard. > > The userspace ORIGIN concept is guarded behind AT_SECURE. The kernel has (To be pedantic: The userspace $ORIGIN concept is only partially gated on AT_SECURE - glibc has an allowlist of acceptable library directories, listed in "/lib64/ld-linux-x86-64.so.2 --list-diagnostics | grep ^path.system_dirs". But clearly we wouldn't want to mirror that in the kernel.) > to enforce the same rule. That means the loader now depends on the type > of binary. I think this is a rather serious issue. And annoyingly, the bprm->secureexec flag can change in security_bprm_creds_from_file(), which is currently reached from begin_new_exec(), which is called after we've already opened the interpreter, so accessing ->secureexec state during the interpreter lookup would require some refactoring. So I think this is a doable change, but would require more work. Or we could take the easy way out and say "the kernel always rejects this unless LSM_UNSAFE_NO_NEW_PRIVS is set", which would make it clear that this can't lead to privilege escalation and would serve as an incentive for people to stop doing stuff that relies on setuid binaries or privileged apparmor/selinux transitions. :P > First, it creates confusion in userspace what loader is used. Second, it > means anything that any build/chroot that uses AT_SECURE binaries now > has to use the sandboxing solution anyway or risk that some binaries use > the system loader and others the chroot loader. I think we would probably just fail the execve() attempt if we see $ORIGIN in the interpreter in an AT_SECURE execution? Since the interpreter field does not allow listing multiple alternatives. > Ignoring AT_SECURE, LSMs likely will need a say in whether that ORIGIN > thing gets honored or not introducing yet another vector where this can > be overriden or ignored. > > Also, we change long-standing kernel behavior which will be very > surprising for any userspace that might implicitly rely on the fact that > the system loader is used. So even if we were to do something like this > it would very likely have to be configurable in some way. I think the proposed patch will only change behavior if the interpreter path starts with "$ORIGIN"? That wouldn't work on existing kernels unless you have a directory literally named "$ORIGIN" in the cwd, because "$ORIGIN/..." would be interpreted as a normal relative path. > This makes this all ripe for malicious loader injection attacks. And we > need to consider this possibility. > > So I'm not enthusiastic about this. I want this to be consistent.

