On Thu, Aug 4, 2016 at 12:11 AM, Sargun Dhillon <sar...@sargun.me> wrote:
> I distributed this patchset to linux-security-mod...@vger.kernel.org earlier,
> but based on the fact that the archive is down, and this is a fairly
> broad-sweeping proposal, I figured I'd grow the audience a little bit. Sorry
> if you received this multiple times.
>
> I've begun building out the skeleton of a Linux Security Module, and I'd like 
> to
> get feedback on it. It's a skeleton, and I've only populated a few hooks, so 
> I'm
> mostly looking for input on the general proposal, interest, and design. It's a
> minor LSM. My particular use case is one in which containers are being
> dynamically deployed to machines by internal developers in a different group.
> The point of Checmate is to act as an extensible bed for _safe_, complex
> security policies. It's nice to enable dynamic security policies that can be
> defined in C, and change as neccessary, without ever having to patch, or 
> rebuild
> the kernel.
>
> For many of these containers, the security policies can be fairly nuanced. One
> particular one to take into account is network security. Often times,
> administrators want to prevent ingress, and egress connectivity except from a
> few select IPs. Egress filtering can be managed using net_cls, but without
> modifying running software, it's non-trivial to attach a filter to all sockets
> being created within a container. The inet_conn_request, socket_recvmsg,
> socket_sock_rcv_skb hooks make this trivial to implement.
>
> Other times, containers need to be throttled in places where there's not 
> really
> a good place to impose that policy for software which isn't built in-house.  
> If
> one wants to limit file creations/sec, or reject I/O under certain
> characteristics, there's not a great place to do it now. This gives engineers 
> a
> mechanism to write those policies.
>
> This same flexibility can be used to take existing programs and enable safe 
> BPF
> helpers to modify memory to allow rules to pass. One example that I prototyped
> was Docker's port mapping, which has an overhead (DNAT), and there's some loss
> of fidelity in the BSD Socket API to identify what's going on. Instead, we can
> just rewrite the port in a bind, based upon some data in a BPF map, and a 
> cgroup
> match.
>
> I can actually see other minor security modules being implemented in Checmate,
> for example, Yama, or the recently proposed Hardchroot could be reimplemented 
> in
> BPF. Potentially, they could even be API compatible.
>
> Although, at first, much of this sounds like seccomp, it's quite different. 
> For
> one, what we can do in the security hooks is more complex (access to kernel
> pointers). The other side of this is we can have effects on a system-wide,
> or cgroup level. This also circumvents the need for CRIU-friendly policies.
>
> Lastly, the flexibility of this mechanism allows for prevention of security
> vulnerabilities which are often complex in nature and require the interaction
> of multiple hooks (CVE-2014-9717 is a good example), and although ksplice,
> and livepatch exist, they're not always easy to use, as compared to loading
> a single bpf program across all kernels.
>
> The user-facing API is exposed via prctl as it's meant to be very simple (at
> least the kernel components). It only has three operations. For a given 
> security
> hook, you can attach a BPF program to it, which will add it to the set of
> programs that are executed over when the hook is hit. You can reset a hook,
> which removes all program associated with a given hook, and you can set a
> deny_reset flag on a hook to prevent anyone from resetting it. It's likely 
> that
> an individual would want to set this in any production use case.

One fairly serious problem that seccomp had to overcome was dealing
with exec+setuid in the face of an attacker. The main example is "what
if we refuse to allow a program to drop privileges via a filter rule?"
For seccomp, no-new-privs was introduced for non-root users of
seccomp. Programmatic syscall (or LSM) filters need to deal with this,
and it's a bit ungainly. :)

Also, if you have a prctl API that already has 3 operations, you might
want to use a new syscall anyway. :)

> On the BPF side of it, all that's involved in the work in progress is to
> move some of the tracing helpers into the shared helpers. For example,
> it's very valuable to have access to current when enforcing a hook.
> BPF programs also have access to maps, which somewhat works around
> the need for security blobs in some cases.

Just from a compatibility perspective, doesn't this end up exposing
kernel structures to userspace? What happens when the structures
change?

And from a security perspective, programmatic examination of kernel
structures means you can trivially leak kernel memory locations and
contents. Resisting these sorts of leaks needs to be addressed too.

This looks like a subset of kprobes but available to non-root users,
which looks rather scary to me at first glance. :)

-Kees

>
> I would love to know what y'all think.
>
> Sargun Dhillon (4):
>   bpf: move tracing helpers to shared helpers
>   bpf, security: Add Checmate
>   security/checmate: Add Checmate sample
>   bpf: Restrict Checmate bpf programs to current kernel ABI
>
>  include/linux/bpf.h              |   2 +
>  include/linux/checmate.h         |  38 +++++
>  include/uapi/linux/Kbuild        |   1 +
>  include/uapi/linux/bpf.h         |   1 +
>  include/uapi/linux/checmate.h    |  65 +++++++++
>  include/uapi/linux/prctl.h       |   3 +
>  kernel/bpf/helpers.c             |  34 +++++
>  kernel/bpf/syscall.c             |   2 +-
>  kernel/trace/bpf_trace.c         |  33 -----
>  samples/bpf/Makefile             |   4 +
>  samples/bpf/bpf_load.c           |  11 +-
>  samples/bpf/checmate1_kern.c     |  28 ++++
>  samples/bpf/checmate1_user.c     |  54 +++++++
>  security/Kconfig                 |   1 +
>  security/Makefile                |   2 +
>  security/checmate/Kconfig        |   6 +
>  security/checmate/Makefile       |   3 +
>  security/checmate/checmate_bpf.c |  67 +++++++++
>  security/checmate/checmate_lsm.c | 304 
> +++++++++++++++++++++++++++++++++++++++
>  19 files changed, 622 insertions(+), 37 deletions(-)
>  create mode 100644 include/linux/checmate.h
>  create mode 100644 include/uapi/linux/checmate.h
>  create mode 100644 samples/bpf/checmate1_kern.c
>  create mode 100644 samples/bpf/checmate1_user.c
>  create mode 100644 security/checmate/Kconfig
>  create mode 100644 security/checmate/Makefile
>  create mode 100644 security/checmate/checmate_bpf.c
>  create mode 100644 security/checmate/checmate_lsm.c
>
> --
> 2.7.4
>



-- 
Kees Cook
Nexus Security

Reply via email to