On 11 May 2018 at 14:41, Martin KaFai Lau <ka...@fb.com> wrote: > On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote: >> On 10 May 2018 at 22:00, Martin KaFai Lau <ka...@fb.com> wrote: >> > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote: >> >> This patch adds a new BPF helper function, sk_lookup() which allows BPF >> >> programs to find out if there is a socket listening on this host, and >> >> returns a socket pointer which the BPF program can then access to >> >> determine, for instance, whether to forward or drop traffic. sk_lookup() >> >> takes a reference on the socket, so when a BPF program makes use of this >> >> function, it must subsequently pass the returned pointer into the newly >> >> added sk_release() to return the reference. >> >> >> >> By way of example, the following pseudocode would filter inbound >> >> connections at XDP if there is no corresponding service listening for >> >> the traffic: >> >> >> >> struct bpf_sock_tuple tuple; >> >> struct bpf_sock_ops *sk; >> >> >> >> populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet >> >> sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0); >> >> if (!sk) { >> >> // Couldn't find a socket listening for this traffic. Drop. >> >> return TC_ACT_SHOT; >> >> } >> >> bpf_sk_release(sk, 0); >> >> return TC_ACT_OK; >> >> >> >> Signed-off-by: Joe Stringer <j...@wand.net.nz> >> >> --- >> >> ... >> >> >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto >> >> bpf_skb_get_xfrm_state_proto = { >> >> }; >> >> #endif >> >> >> >> +struct sock * >> >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) { >> > Would it be possible to have another version that >> > returns a sk without taking its refcnt? >> > It may have performance benefit. >> >> Not really. The sockets are not RCU-protected, and established sockets >> may be torn down without notice. If we don't take a reference, there's >> no guarantee that the socket will continue to exist for the duration >> of running the BPF program. >> >> From what I follow, the comment below has a hidden implication which >> is that sockets without SOCK_RCU_FREE, eg established sockets, may be >> directly freed regardless of RCU. > Right, SOCK_RCU_FREE sk is the one I am concern about. > For example, TCP_LISTEN socket does not require taking a refcnt > now. Doing a bpf_sk_lookup() may have a rather big > impact on handling TCP syn flood. or the usual intention > is to redirect instead of passing it up to the stack?
I see, if you're only interested in listen sockets then probably this series could be extended with a new flag, eg something like BPF_F_SK_FIND_LISTENERS which restricts the set of possible sockets found to only listen sockets, then the implementation would call into __inet_lookup_listener() instead of inet_lookup(). The presence of that flag in the relevant register during CALL instruction would show that the verifier should not reference-track the result, then there'd need to be a check on the release to ensure that this unreferenced socket is never released. Just a thought, completely untested and I could still be missing some detail.. That said, I don't completely follow how you would expect to handle the traffic for sockets that are already established - the helper would no longer find those sockets, so you wouldn't know whether to pass the traffic up the stack for established traffic or not.