On 12/01/16 at 10:11am, Florian Westphal wrote:
> Aside from this, XDP, like DPDK, is a kernel bypass.
> You might say 'Its just stack bypass, not a kernel bypass!'.
> But what does that mean exactly? That packets can still be passed
> onward to normal stack?
> Bypass solutions like netmap can also inject packets back to
> kernel stack again.
I have a fundamental issue with the approach of exporting packets into
user space and reinjecting them: Once the packet leaves the kernel,
any security guarantees are off. I have no control over what is
running in user space and whether whatever listener up there has been
compromised or not. To me, that's a no go, in particular for servers
hosting multi tenant workloads. This is one of the main reasons why
XDP, in particular in combination with BPF, is very interesting to me.
> b). with regards to a programmable data path: IFF one wants to do this
> in kernel (and thats a big if), it seems much more preferrable to provide
> a config/data-based approach rather than a programmable one. If you want
> full freedom DPDK is architecturally just too powerful to compete with.
I must have missed the legal disclaimer that is usually put in front
of the DPDK marketing show :-)
I don't want full freedom. I want programmability with stack integration
at sufficient speed and the ability to benefit from the hardware
abstractions that the kernel provides.
> Proponents of XDP sometimes provide usage examples.
> Lets look at some of these.
[ I won't comment on any of the other use cases because they are of no
interest to me ]
> * Load balancer
> State holding algorithm need sorting and searching, so also no fit for
> eBPF (could be exposed by function exports, but then can we do DoS by
> finding worst case scenarios?).
> Also again needs way to forward frame out via another interface.
> For cases where packet gets sent out via same interface it would appear
> to be easier to use port mirroring in a switch and use stochastic filtering
> on end nodes to determine which host should take responsibility.
> XDP plus: central authority over how distribution will work in case
> nodes are added/removed from pool.
> But then again, it will be easier to hande this with netmap/dpdk where
> more complicated scheduling algorithms can be used.
I agree with you if the LB is a software based appliance in either a
dedicated VM or on dedicated baremetal.
The reality is turning out to be different in many cases though, LB
needs to be performed not only for north south but east west as well.
So even if I would handle LB for traffic entering my datacenter in user
space, I will need the same LB for packets from my applications and
I definitely don't want to move all of that into user space.
> * early drop/filtering.
> While its possible to do "u32" like filters with ebpf, all modern nics
> support ntuple filtering in hardware, which is going to be faster because
> such packet will never even be signalled to the operating system.
> For more complicated cases (e.g. doing socket lookup to check if particular
> packet does match bound socket (and expected sequence numbers etc) I don't
> see easy ways to do that with XDP (and without sk_buff context).
> Providing it via function exports is possible of course, but that will only
> result in an "arms race" where we will see special-sauce functions
> all over the place -- DoS will always attempt to go for something
> that is difficult to filter against, cf. all the recent volume-based
You probably put this last because this was the most difficult to
shoot down ;-)
The benefits of XDP for this use case are extremely obvious in combination
with local applications which need to be protected. ntuple filters won't
cut it. They are limited and subject to a certain rate at which they
can be configured. Any serious mitigation will require stateful filtering
with at least minimal L7 matching abilities and this is exactly where XDP