On Tue, Feb 13, 2018 at 12:33 PM, Tom Hromatka <tom.hroma...@oracle.com> wrote:
> On Tue, Feb 13, 2018 at 7:42 AM, Sargun Dhillon <sar...@sargun.me> wrote:
>> This patchset enables seccomp filters to be written in eBPF. Although,
>> this patchset doesn't introduce much of the functionality enabled by
>> eBPF, it lays the ground work for it.
>> It also introduces the capability to dump eBPF filters via the PTRACE
>> API in order to make it so that CHECKPOINT_RESTORE will be satisifed.
>> In the attached samples, there's an example of this. One can then use
>> BPF_OBJ_GET_INFO_BY_FD in order to get the actual code of the program,
>> and use that at reload time.
>> The primary reason for not adding maps support in this patchset is
>> to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS.
>> If we have a map that the BPF program can read, it can potentially
>> "change" privileges after running. It seems like doing writes only
>> is safe, because it can be pure, and side effect free, and therefore
>> not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come
>> to an agreement, this can be in a follow-up patchset.
> Coincidentally I also sent an RFC for adding eBPF hash maps to the seccomp
> userspace mailing list just last week:
> The kernel changes I proposed are in this email:
> In that email thread, Kees requested that I try out a binary tree in cBPF
> and evaluate its performance. I just got a rough prototype working, and
> while not as fast as an eBPF hash map, the cBPF binary tree was a
> improvement over the linear list of ifs that are currently generated. Also,
> it only required changing a single function within the libseccomp libary
> Here are the results I am currently seeing using an in-house customer's
> seccomp filter and a simplistic test program that runs getppid() thousands
> of times.
> Test Case minimum TSC ticks to make syscall
> seccomp disabled 620
> getppid() at the front of 306-syscall seccomp filter 722
> getppid() in middle of 306-syscall seccomp filter 1392
> getppid() at the end of the 306-syscall filter 2452
> seccomp using a 306-syscall-sized EBPF hash map 800
> cBPF filter using a binary tree 922
I still think that's a crazy filter. :) It should be inverted to just
check the 26 syscalls and a final "greater than" test. I would expect
it to be faster still. :)