On Tue, Feb 13, 2018 at 7:47 AM, Kees Cook <keesc...@chromium.org> wrote:
> On Tue, Feb 13, 2018 at 7:42 AM, Sargun Dhillon <sar...@sargun.me> wrote:
>> This patchset enables seccomp filters to be written in eBPF. Although,
>> this patchset doesn't introduce much of the functionality enabled by
>> eBPF, it lays the ground work for it.
>> It also introduces the capability to dump eBPF filters via the PTRACE
>> API in order to make it so that CHECKPOINT_RESTORE will be satisifed.
>> In the attached samples, there's an example of this. One can then use
>> BPF_OBJ_GET_INFO_BY_FD in order to get the actual code of the program,
>> and use that at reload time.
>> The primary reason for not adding maps support in this patchset is
>> to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS.
>> If we have a map that the BPF program can read, it can potentially
>> "change" privileges after running. It seems like doing writes only
>> is safe, because it can be pure, and side effect free, and therefore
>> not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come
>> to an agreement, this can be in a follow-up patchset.
> What's the reason for adding eBPF support? seccomp shouldn't need it,
> and it only makes the code more complex. I'd rather stick with -- cBPF
> until we have an overwhelmingly good reason to use eBPF as a "native"
> seccomp filter language.
1) The userspace tooling for eBPF is much better than the user space
tooling for cBPF. Our use case is specifically to optimize Docker
policies. This is roughly what their seccomp policy looks like:
It would be much nicer to be able to leverage eBPF to write this in C,
or any other the other languages targetting eBPF. In addition, if we
have write-only maps, we can exfiltrate information from seccomp, like
arguments, and errors in a relatively cheap way compared to cBPF, and
then extract this via the bcc stack. Writing cBPF via C macros is a
pain, and the off the shelf cBPF libraries are getting no love. The
eBPF community is *exploding* with contributions.
2) In my testing, which thus so far has been very rudimentary, with
rewriting the policy that libseccomp generates from the Docker policy
to use eBPF, and eBPF maps performs much better than cBPF. The
specific case tested was to use a bpf array to lookup rules for a
particular syscall. In a super trivial test, this was about 5% low
latency than using traditional branches. If you need more evidence of
this, I can work a little bit more on the maps related patches, and
see if I can get some more benchmarking. From my understanding, we
would need to add "sealing" support for maps, in which they can be
marked as read-only, and only at that point should an eBPF seccomp
program be able to read from them.
3) Eventually, I'd like to use some more advanced capabilities of
eBPF, like being able to rewrite arguments safely (not things referred
to by pointers, but just plain old arguments).
>> Sargun Dhillon (3):
>> bpf, seccomp: Add eBPF filter capabilities
>> seccomp, ptrace: Add a mechanism to retrieve attached eBPF seccomp
>> bpf: Add eBPF seccomp sample programs
>> arch/Kconfig | 7 ++
>> include/linux/bpf_types.h | 3 +
>> include/linux/seccomp.h | 12 +++
>> include/uapi/linux/bpf.h | 2 +
>> include/uapi/linux/ptrace.h | 5 +-
>> include/uapi/linux/seccomp.h | 15 ++--
>> kernel/bpf/syscall.c | 1 +
>> kernel/ptrace.c | 3 +
>> kernel/seccomp.c | 185
>> samples/bpf/Makefile | 9 +++
>> samples/bpf/bpf_load.c | 9 ++-
>> samples/bpf/seccomp1_kern.c | 17 ++++
>> samples/bpf/seccomp1_user.c | 34 ++++++++
>> samples/bpf/seccomp2_kern.c | 24 ++++++
>> samples/bpf/seccomp2_user.c | 66 +++++++++++++++
>> 15 files changed, 362 insertions(+), 30 deletions(-)
>> create mode 100644 samples/bpf/seccomp1_kern.c
>> create mode 100644 samples/bpf/seccomp1_user.c
>> create mode 100644 samples/bpf/seccomp2_kern.c
>> create mode 100644 samples/bpf/seccomp2_user.c
> Kees Cook
> Pixel Security