On Mon, Jan 23, 2017 at 8:05 PM, David Ahern <d...@cumulusnetworks.com> wrote:
> On 1/23/17 8:37 PM, Andy Lutomirski wrote:
>> Yes, it is a bug because cgroup+bpf causes unwitting programs to be
>> subject to BPF code installed by a different, potentially unrelated
>> process.  That's a new situation.  The failure can happen when a
>> privileged supervisor (whoever runs ip vrf) runs a clueless or
>> unprivileged program (the thing calling unshare()).
>
> There are many, many ways to misconfigure networking and to run programs in a 
> context or with an input argument that causes the program to not work at all, 
> not work as expected or stop working. This situation is no different.
>
> For example, the only aspect of BPF_PROG_TYPE_CGROUP_SOCK filters that are 
> namespace based is the ifindex. You brought up the example of changing 
> namespaces where the ifindex is not defined. Alexei mentioned an example 
> where interfaces can be moved to another namespace breaking any ifindex based 
> programs. Another example is the interface can be deleted. Deleting an 
> interface with sockets bound to it does not impact the program in any way - 
> no notification, no wakeup, nothing. The sockets just don't work.

And if you use 'ip vrf' to bind to a vrf with ifindex 4 and a program
unshares netns and creates an interface with ifindex 4, then that
program will end up with its sockets magically bound to ifindex 4 and
will silently malfunction.

I can think of multiple ways to address this problem.  You could scope
the hooks to a netns (which is sort of what my patch does).  You could
find a way to force programs in a given cgroup to only execute in a
single netns, although that would probably cause other breakage.  You
could improve the BPF hook API to be netns-aware, which could plausbly
address issues related to unshare() but might get very tricky when
setns() is involved.

My point is that, in 4.10-rc, it doesn't work right, and I doubt this
problem is restricted to just 'ip vrf'.  Without some kind of change
to the way that netns and cgroup+bpf interact, anything that uses
sk_bound_dev_if or reads the ifindex on an skb will be subject to a
huge footgun that unprivileged programs can trigger and any future
attempt to make the cgroup+bpf work for unprivileged users is going to
be more complicated than it deserves to be.

(Aside: I wonder if a similar goal to 'ip vrf' could be achieved by
moving the vrf to a different network namespace and putting programs
that you want to be subject to the VRF into that namespace.)

>
> The point of using a management tool like ip is to handle the details to make 
> things sane -- which is the consistent with the whole point of ip netns vs 
> running unshare -n.

I still don't understand how you're going to make 'ip netns' make this
problem go away.  Keep in mind that there are real programs that call
the unshare() syscall directly.

>
>>
>> If the way cgroup+bpf worked was that a program did a prctl() or
>> similar to opt in to being managed by a provided cgroup-aware BPF
>> program, then I'd agree with everything you're saying.  But that's not
>> at all how this code works.
>
> I don't want an opt-in approach. The program does not need to know or even 
> care that it has some restriction. Today, someone runs 'ip netns exec NAME 
> COMMAND' the command does not need to know or care that the network namespace 
> was changed on it. Same with 'ip vrf exec' -- it is a network policy that is 
> forced on the program by the user.

I absolutely agree.  My point is that cgroup+bpf is *not* opt-in, so
the bar should be higher.  You can't blame the evil program that
called unshare() for breaking things when 'ip vrf' unavoidably sets
things up so that unshare will malfunction.  It should avoid
malfunctioning when running Linux programs that are unaware of it.
This should include programs like unshare and 'ip netns'.

Reply via email to