On Wed, 21 Sep 2016 13:28:52 +0200 Willy Tarreau <w...@1wt.eu> wrote:
> Over the last 3 years I've been working a bit on high traffic processing
> for various reasons. It started with the wish to capture line-rate GigE
> traffic on very small fanless ARM machines and the framework has evolved
> to be used at my company as a basis for our anti-DDoS engine capable of
> dealing with multiple 10G links saturated with floods.
> I know it comes a bit late now that there is XDP, but it's my first
> vacation since then and I needed to have a bit of calm time to collect
> the patches from the various branches and put them together. Anyway I'm
> sending this here in case it can be of interest to anyone, for use or
> just to study it.
I definitely want to study it!
You mention XDP. If you didn't notice, I've created some documentation
on XDP (it is very "live" documentation at this point and it will
hopefully "materialized" later in the process). But it should be a good
starting point for understanding XDP:
> I presented it in 2014 at kernel recipes :
Cool, and it even have a video!
> It now supports drivers mvneta, ixgbe, e1000e, e1000 and igb. It is
> very light, and retrieves the packets in the NIC's driver before they
> are converted to an skb, then submits them to a registered RX handler
> running in softirq context so we have the best of all worlds by
> benefitting from CPU scalability, delayed processing, and not paying
> the cost of switching to userland. Also an rx_done() function allows
> handlers to batch their processing.
Wow - it does sound a lot like XDP! I would say that is sort of
validate the current direction of XDP, and that there are real
use-cases for this stuff.
> The RX handler returns an action
> among accepting the packet as-is, accepting it modified (eg: vlan or
> tunnel decapsulation), dropping it, postponing the processing
> (equivalent to EAGAIN), or building a new packet to send back.
I'll be very interested in studying in-details how you implemented and
choose what actions to implement.
What was the need for postponing the processing (EAGAIN)?
> This last function is the one requiring the most changes in existing
> drivers, but offers the widest range of possibilities. We use it to
> send SYN cookies, but I have also implemented a stateless HTTP server
> supporting keep-alive using it, achieving line-rate traffic processing
> on a single CPU core when the NIC supports it. It's very convenient to
> test various stateful TCP components as it's easy to sustain millions
> of connections per second on it.
Interesting, and controversial use-case. One controversial use-case
for XDP, that I imagine was implementing a DNS accelerator, what
answers simple and frequent requests. You took it a step further with
a HTTP server!
> It does not support forwarding between NICs. It was my first goal
> because I wanted to implement a TAP with it, bridging the traffic
> between two ports, but figured it was adding some complexity to the
> system back then.
With all the XDP features at the moment, we have avoided going through
the page allocator, by relying on different page recycling tricks.
When doing forwarding between NICs is it harder to do these page
recycling tricks. I've measured that page allocators fast-path
("recycling" same page) cost approx 270 cycles, and the 14Mpps cycle
count on this 4GHz CPU is 268 cycles. Thus, it is a non-starter...
Did you have to modify the page allocator?
Or implement some kind of recycling?
> However since then we've implemented traffic
> capture in our product, exploiting this framework to capture without
> losses at 14 Mpps. I may find some time to try to extract it later.
> It uses the /sys API so that you can simply plug tcpdump -r on a
> file there, though there's also an mmap version which uses less CPU
> (that's important at 10G).
Interesting. I do see a XDP use-case for RAW packet capture, but I've
postponed that work until later. I would interested in how you solved
it? E.g. Do you support zero-copy?
> In its current form since the initial code's intent was to limit
> core changes, it happens not to modify anything in the kernel by
> default and to reuse the net_device's ax25_ptr to attach devices
> (idea borrowed from netmap), so it can be used on an existing
> kernel just by loading the patched network drivers (yes, I know
> it's not a valid solution for the long term).
> The current code is available here :
I was just about to complain that the link was broken... but it fixed
itself while writing this email ;-)
Can you instead explain what branch to look at?
> Please let me know if there could be some interest in rebasing it
> on more recent versions (currently 3.10, 3.14 and 4.4 are supported).
What, no support for 2.4 ;-)
> I don't have much time to assign to it since it works fine as-is,
> but will be glad to do so if that can be useful.
> Also the stateless HTTP server provided in it definitely is a nice
> use case for testing such a framework.
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org