** Changed in: linux (Ubuntu)
Status: In Progress => Fix Committed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1847744
Title:
seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
Status in linux package in Ubuntu:
Fix Committed
Status in linux source package in Disco:
Fix Committed
Status in linux source package in Eoan:
Fix Committed
Bug description:
SRU Justification
Impact: Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF
(cf. [4]) which enables a process (watchee) to retrieve an fd for its
seccomp filter. This fd can then be handed to another (usually more
privileged) process (watcher). The watcher will then be able to
receive seccomp messages about the syscalls having been performed by
the watchee.
This feature is heavily used in some userspace workloads. For example,
it is currently used to intercept mknod() syscalls in user namespaces
aka in containers. The mknod() syscall can be easily filtered based on
dev_t. This allows us to only intercept a very specific subset of
mknod() syscalls. Furthermore, mknod() is not possible in user
namespaces toto coelo and so intercepting and denying syscalls that
are not in the whitelist on accident is not a big deal. The watchee
won't notice a difference.
In contrast to mknod(), a lot of other syscall we intercept (e.g. setxattr())
cannot be easily filtered like mknod() because they have pointer arguments.
Additionally, some of them might actually succeed in user namespaces (e.g.
setxattr() for all "user.*" xattrs). Since we currently cannot tell seccomp to
continue from a user notifier we are stuck with performing all of the syscalls
in lieu of the container. This is a huge security liability since it is
extremely difficult to correctly assume all of the necessary privileges of the
calling task
such that the syscall can be successfully emulated without escaping other
additional security restrictions (think missing CAP_MKNOD for mknod(), or
MS_NODEV on a filesystem etc.). This can be solved by telling seccomp to resume
the syscall.
Fix: Allow the seccomp notifier to continue a syscall. A positive
discussion about this feature was triggered by a post to the ksummit-
discuss mailing list (cf. [3]) and took place during KSummit (cf. [1])
and again at the containers/checkpoint-restore micro-conference at
Linux Plumbers.
Regression Potential: Limited to seccomp. The patchset also comes with
proper selftests in addition to the large set of seccomp selftests
that are already there. This further reduces regression potential.
Test Case:
Compile a kernel with the patch applied and run the selftests or trap a
syscall via the notifier fd and set the newly introduced flag. The syscall
should then have continued.
Target Kernels: All current LTS kernels.
Patches:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=for-next/seccomp&id=fb3c5386b382d4097476ce9647260fc89b34afdb
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=for-next/seccomp&id=0eebfed2954f152259cae0ad57b91d3ea92968e8
/* References */
[1]: https://linuxplumbersconf.org/event/4/contributions/560
[2]: https://linuxplumbersconf.org/event/4/contributions/477
[3]: https://lore.kernel.org/r/[email protected]
[4]: commit 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1847744/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp