On 02/13/2018 01:35 PM, Kees Cook wrote:
On Tue, Feb 13, 2018 at 12:33 PM, Tom Hromatka <tom.hroma...@oracle.com> wrote:
On Tue, Feb 13, 2018 at 7:42 AM, Sargun Dhillon <sar...@sargun.me> wrote:
This patchset enables seccomp filters to be written in eBPF. Although,
this patchset doesn't introduce much of the functionality enabled by
eBPF, it lays the ground work for it.
It also introduces the capability to dump eBPF filters via the PTRACE
API in order to make it so that CHECKPOINT_RESTORE will be satisifed.
In the attached samples, there's an example of this. One can then use
BPF_OBJ_GET_INFO_BY_FD in order to get the actual code of the program,
and use that at reload time.
The primary reason for not adding maps support in this patchset is
to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS.
If we have a map that the BPF program can read, it can potentially
"change" privileges after running. It seems like doing writes only
is safe, because it can be pure, and side effect free, and therefore
not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come
to an agreement, this can be in a follow-up patchset.
Coincidentally I also sent an RFC for adding eBPF hash maps to the seccomp
userspace mailing list just last week:
The kernel changes I proposed are in this email:
In that email thread, Kees requested that I try out a binary tree in cBPF
and evaluate its performance. I just got a rough prototype working, and
while not as fast as an eBPF hash map, the cBPF binary tree was a
improvement over the linear list of ifs that are currently generated. Also,
it only required changing a single function within the libseccomp libary
Here are the results I am currently seeing using an in-house customer's
seccomp filter and a simplistic test program that runs getppid() thousands
Test Case minimum TSC ticks to make syscall
seccomp disabled 620
getppid() at the front of 306-syscall seccomp filter 722
getppid() in middle of 306-syscall seccomp filter 1392
getppid() at the end of the 306-syscall filter 2452
seccomp using a 306-syscall-sized EBPF hash map 800
cBPF filter using a binary tree 922
I still think that's a crazy filter. :) It should be inverted to just
check the 26 syscalls and a final "greater than" test. I would expect
it to be faster still. :)
I completely agree it's a crazy filter, but it seems to be a
common "mistake" our users are making. It would be nice to
help them out if we can.