For our container management we've been using complicated and fragile setup
consisting of LD_PRELOAD wrapper intercepting bind and connect calls from
all containerized applications.
The setup involves per-container IPs, policy, etc, so traditional
network-only solutions that involve VRFs, netns, acls are not applicable.
Changing apps is not possible and LD_PRELOAD doesn't work
for apps that don't use glibc like java and golang.
BPF+cgroup looks to be the best solution for this problem.
Hence we introduce 3 hooks:
- at entry into sys_bind and sys_connect
to let bpf prog look and modify 'struct sockaddr' provided
by user space and fail bind/connect when appropriate
- post sys_bind after port is allocated
The approach works great and has zero overhead for anyone who doesn't
use it and very low overhead when deployed.
The main question for Daniel and Dave is what approach to take
with prog types...
In this patch set we introduce 6 new program types to make user
since v4 programs should not be using 'struct bpf_sock_addr'->user_ip6 fields
and different prog type for v4 and v6 helps verifier reject such access
at load time.
Similarly bind vs connect are two different prog types too,
since only sys_connect programs can call new bpf_bind() helper.
This approach is very different from tcp-bpf where single
'struct bpf_sock_ops' and single prog type is used for different hooks.
The field checks are done at run-time instead of load time.
I think the approach taken by this patch set is justified,
but we may do better if we extend BPF_PROG_ATTACH cmd
with log_buf + log_size, then we should be able to combine
bind+connect+v4+v6 into single program type.
The idea that at load time the verifier will remember a bitmask
of fields in bpf_sock_addr used by the program and helpers
that program used, then at attach time we can check that
hook is compatible with features used by the program and
report human readable error message back via log_buf.
We cannot do this right now with just EINVAL, since combinations
of errors like 'using user_ip6 field but attaching to v4 hook'
are too high to express as errno.
This would be bigger change. If you folks think it's worth it
we can go with this approach or if you think 6 new prog types
is not too bad, we can leave the patch as-is.
Other comments on patches are welcome.
Andrey Ignatov (6):
bpf: Hooks for sys_bind
selftests/bpf: Selftest for sys_bind hooks
net: Introduce __inet_bind() and __inet6_bind
bpf: Hooks for sys_connect
selftests/bpf: Selftest for sys_connect hooks
bpf: Post-hooks for sys_bind
include/linux/bpf-cgroup.h | 68 +++-
include/linux/bpf_types.h | 6 +
include/linux/filter.h | 10 +
include/net/inet_common.h | 2 +
include/net/ipv6.h | 2 +
include/net/sock.h | 3 +
include/net/udp.h | 1 +
include/uapi/linux/bpf.h | 52 ++-
kernel/bpf/cgroup.c | 36 ++
kernel/bpf/syscall.c | 42 ++
kernel/bpf/verifier.c | 6 +
net/core/filter.c | 479 ++++++++++++++++++++++-
net/ipv4/af_inet.c | 60 ++-
net/ipv4/tcp_ipv4.c | 16 +
net/ipv4/udp.c | 14 +
net/ipv6/af_inet6.c | 47 ++-
net/ipv6/tcp_ipv6.c | 16 +
net/ipv6/udp.c | 20 +
tools/include/uapi/linux/bpf.h | 39 +-
tools/testing/selftests/bpf/Makefile | 8 +-
tools/testing/selftests/bpf/bpf_helpers.h | 2 +
tools/testing/selftests/bpf/connect4_prog.c | 45 +++
tools/testing/selftests/bpf/connect6_prog.c | 61 +++
tools/testing/selftests/bpf/test_sock_addr.c | 541 ++++++++++++++++++++++++++
tools/testing/selftests/bpf/test_sock_addr.sh | 57 +++
25 files changed, 1580 insertions(+), 53 deletions(-)
create mode 100644 tools/testing/selftests/bpf/connect4_prog.c
create mode 100644 tools/testing/selftests/bpf/connect6_prog.c
create mode 100644 tools/testing/selftests/bpf/test_sock_addr.c
create mode 100755 tools/testing/selftests/bpf/test_sock_addr.sh