> > I tried to start a discussion about eBPF support with DPDK in last DPDK > meeting in Santa Clara: > > https://dpdksummit.com/Archive/pdf/2017USA/DPDK%20support%20for%20new%20hardware%20offloads.pdf > > In slide 17 I have some points which, IMHO, are worth to discuss before > adding this support. > > I can see compatibility with eBPF programs used with the kernel being just > enough for adding this to DPDK, but if I understand where eBPF > inside the kernel is going (regarding network stack), those programs are > going to (or could) refer to kernel "code", so maybe this > compatibility is just impossible to support. That would force a check for > avoiding those programs with such references and I can see this > would become in a mess quickly.
Inside DPDK we can (and should, I think) support eBPF ISA (https://github.com/iovisor/bpf-docs/blob/master/eBPF.md). Though of course it would be hard (if possible at all) to support kernel specific structures and functions. And I don't think we have to go that way, instead it would much plausible for DPDK users to allow eBPF inside DPDK to refer DPDK specific structures/functions (rte_mbuf, etc.). So if we have a eBPF program that accepts pointer to raw packet data as an input and doesn't refer any external symbols - it should run unmodified with both kernel and DPDK BPF VM. In other cases we wouldn't have full compatibility here. > > Assuming this issue could be overcome (or not an issue at all), maybe it > makes sense to execute eBPF programs but, does it make sense to > execute eBPF code? To start with, we are going to execute userspace code in > userspace context, so some (I would say main) reasons behind > eBPF do not apply. Well, these days BPF used for many different purposes inside kernel. Some of these purposes would be valid for DPDK apps too, others probably wouldn't. For example - ability to dynamically create/destroy packet filters to classify/trace/drop/collect statistics in a user defined way - that I think what many users would be interested in and that what DPDK is missing these days. Again, nothing prevents people to use BPF inside DPDK for something totally different from current kernel usages. > And from a performance point of view, can we ensure eBPF code execution is > going to be at same level than > DPDK? Obviously performance depends from many things: - actual eBPF code you are going to execute - interpreter/JIT/HW offload you are going to use for that - context at which eBPF VM will be executed - etc. In general, if you load a new packet filter running in SW - yes that would consume some extra CPU cycles and might affect performance. But in many cases it is an acceptable tradeoff - functionality vs performance. Again, it is totally up to user - if he feels he doesn't need that functionality, he just wouldn't load BPF programs. >Would not it be a better idea to translate ebpf programs to other language >like ... C? clang (starting from v 3.7) supports eBPF as one of its backend targets, so now it is possible to write eBPF procedures using C (restricted version). In fact, all samples in patch #5 are written in pure C. > > Don't take me wrong. I'm not against adding eBPF at all. In fact, from my > company's point of view, Netronome, we would be happy to have > this with DPDK and to support eBPF offload as this is possible now with the > netdev driver. Konstantin > > > On Fri, Mar 9, 2018 at 4:42 PM, Konstantin Ananyev > <konstantin.anan...@intel.com> wrote: > BPF is used quite intensively inside Linux (and BSD) kernels > for various different purposes and proved to be extremely useful. > > BPF inside DPDK might also be used in a lot of places > for a lot of similar things. > As an example to: > - packet filtering/tracing (aka tcpdump) > - packet classification > - statistics collection > - HW/PMD live-system debugging/prototyping - trace HW descriptors, > internal PMD SW state, etc. > ... > > All of that in a dynamic, user-defined and extensible manner. > > So these series introduce new library - librte_bpf. > librte_bpf provides API to load and execute BPF bytecode within > user-space dpdk app. > It supports basic set of features from eBPF spec. > Also it introduces basic framework to load/unload BPF-based filters > on eth devices (right now via SW RX/TX callbacks). > > How to try it: > =============== > > 1) run testpmd as usual and start your favorite forwarding case. > 2) build bpf program you'd like to load > (you'll need clang v3.7 or above): > $ cd test/bpf > $ clang -O2 -target bpf -c t1.c > > 3) load bpf program(s): > testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename> > > <load-flags>: [-][J][M] > J - use JIT generated native code, otherwise BPF interpreter will be used. > M - assume input parameter is a pointer to rte_mbuf, > otherwise assume it is a pointer to first segment's data. > > Few examples: > > # to load (not JITed) dummy.o at TX queue 0, port 0: > testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o > > #to load (and JIT compile) t1.o at RX queue 0, port 1: > testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o > > #to load and JIT t3.o (note that it expects mbuf as an input): > testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o > > If you are curious to check JIT generated native code: > gdb -p `pgrep testpmd` > (gdb) disas 0x7fd173c5f000,+76 > Dump of assembler code from 0x7fd173c5f000 to 0x7fd173c5f04c: > 0x00007fd173c5f000: mov %rdi,%rsi > 0x00007fd173c5f003: movzwq 0x10(%rsi),%rdi > 0x00007fd173c5f008: mov 0x0(%rsi),%rdx > 0x00007fd173c5f00c: add %rdi,%rdx > 0x00007fd173c5f00f: movzbq 0xc(%rdx),%rdi > 0x00007fd173c5f014: movzbq 0xd(%rdx),%rdx > 0x00007fd173c5f019: shl $0x8,%rdx > 0x00007fd173c5f01d: or %rdi,%rdx > 0x00007fd173c5f020: cmp $0x608,%rdx > 0x00007fd173c5f027: jne 0x7fd173c5f044 > 0x00007fd173c5f029: mov $0xb712e8,%rdi > 0x00007fd173c5f030: mov 0x0(%rdi),%rdi > 0x00007fd173c5f034: mov $0x40,%rdx > 0x00007fd173c5f03b: mov $0x4db2f0,%rax > 0x00007fd173c5f042: callq *%rax > 0x00007fd173c5f044: mov $0x1,%rax > 0x00007fd173c5f04b: retq > End of assembler dump. > > 4) observe changed traffic behavior > Let say with the examples above: > - dummy.o does literally nothing, so no changes should be here, > except some possible slowdown. > - t1.o - should force to drop all packets that doesn't match: > 'dst 22.214.171.124 && udp && dst port 5000' filter. > - t3.o - should dump to stdout ARP packets. > > 5) unload some or all bpf programs: > testpmd> bpf-unload tx 0 0 > > 6) continue with step 3) or exit > > TODO list: > ========== > - meson build > - UT for it > - implement proper validate() > - allow JIT to generate bulk version > - FreeBSD support > > Not currently supported features: > ================================= > - cBPF > - tail-pointer call > - eBPF MAP > - JIT for non X86_64 targets > - skb > > Konstantin Ananyev (5): > bpf: add BPF loading and execution framework > bpf: add JIT compilation for x86_64 ISA. > bpf: introduce basic RX/TX BPF filters > testpmd: new commands to load/unload BPF filters > test: add few eBPF samples > > app/test-pmd/bpf_sup.h | 25 + > app/test-pmd/cmdline.c | 146 ++++ > config/common_base | 5 + > config/common_linuxapp | 1 + > lib/Makefile | 2 + > lib/librte_bpf/Makefile | 35 + > lib/librte_bpf/bpf.c | 52 ++ > lib/librte_bpf/bpf_exec.c | 452 ++++++++++++ > lib/librte_bpf/bpf_impl.h | 37 + > lib/librte_bpf/bpf_jit_x86.c | 1329 > ++++++++++++++++++++++++++++++++++++ > lib/librte_bpf/bpf_load.c | 380 +++++++++++ > lib/librte_bpf/bpf_pkt.c | 524 ++++++++++++++ > lib/librte_bpf/bpf_validate.c | 55 ++ > lib/librte_bpf/rte_bpf.h | 158 +++++ > lib/librte_bpf/rte_bpf_ethdev.h | 50 ++ > lib/librte_bpf/rte_bpf_version.map | 16 + > mk/rte.app.mk | 2 + > test/bpf/dummy.c | 20 + > test/bpf/mbuf.h | 556 +++++++++++++++ > test/bpf/t1.c | 53 ++ > test/bpf/t2.c | 30 + > test/bpf/t3.c | 36 + > 22 files changed, 3964 insertions(+) > create mode 100644 app/test-pmd/bpf_sup.h > create mode 100644 lib/librte_bpf/Makefile > create mode 100644 lib/librte_bpf/bpf.c > create mode 100644 lib/librte_bpf/bpf_exec.c > create mode 100644 lib/librte_bpf/bpf_impl.h > create mode 100644 lib/librte_bpf/bpf_jit_x86.c > create mode 100644 lib/librte_bpf/bpf_load.c > create mode 100644 lib/librte_bpf/bpf_pkt.c > create mode 100644 lib/librte_bpf/bpf_validate.c > create mode 100644 lib/librte_bpf/rte_bpf.h > create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h > create mode 100644 lib/librte_bpf/rte_bpf_version.map > create mode 100644 test/bpf/dummy.c > create mode 100644 test/bpf/mbuf.h > create mode 100644 test/bpf/t1.c > create mode 100644 test/bpf/t2.c > create mode 100644 test/bpf/t3.c > > -- > 2.13.6