On Tue, Nov 24, 2020 at 10:55 AM Jason Wang <jasow...@redhat.com> wrote:

>
> On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
> > From: Andrew <and...@daynix.com>
> >
> > Also, added maintainers information.
> >
> > Signed-off-by: Yuri Benditovich <yuri.benditov...@daynix.com>
> > Signed-off-by: Andrew Melnychenko <and...@daynix.com>
> > ---
> >   MAINTAINERS       |   7 +++
> >   docs/ebpf_rss.rst | 133 ++++++++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 140 insertions(+)
> >   create mode 100644 docs/ebpf_rss.rst
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 2c22bbca5a..d93c85b867 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3111,6 +3111,13 @@ S: Maintained
> >   F: hw/semihosting/
> >   F: include/hw/semihosting/
> >
> > +EBPF:
> > +M: Jason Wang <jasow...@redhat.com>
> > +R: Andrew Melnychenko <and...@daynix.com>
> > +R: Yuri Benditovich <yuri.benditov...@daynix.com>
> > +S: Maintained
> > +F: ebpf/*
> > +
> >   Build and test automation
> >   -------------------------
> >   Build and test automation
> > diff --git a/docs/ebpf_rss.rst b/docs/ebpf_rss.rst
> > new file mode 100644
> > index 0000000000..f832defdf4
> > --- /dev/null
> > +++ b/docs/ebpf_rss.rst
> > @@ -0,0 +1,133 @@
> > +===========================
> > +eBPF RSS virtio-net support
> > +===========================
> > +
> > +RSS(Receive Side Scaling) is used to distribute network packets to
> guest virtqueues
> > +by calculating packet hash. Usually every queue is processed then by a
> specific guest CPU core.
> > +
> > +For now there are 2 RSS implementations in qemu:
> > +- 'in-qemu' RSS (functions if qemu receives network packets, i.e.
> vhost=off)
> > +- eBPF RSS (can function with also with vhost=on)
> > +
> > +eBPF support (CONFIG_EBPF) is enabled by 'configure' script.
> > +To enable eBPF RSS support use './configure --enable-bpf'.
> > +
> > +If steering BPF is not set for kernel's TUN module, the TUN uses
> automatic selection
> > +of rx virtqueue based on lookup table built according to calculated
> symmetric hash
> > +of transmitted packets.
> > +If steering BPF is set for TUN the BPF code calculates the hash of
> packet header and
> > +returns the virtqueue number to place the packet to.
> > +
> > +Simplified decision formula:
> > +
> > +.. code:: C
> > +
> > +    queue_index = indirection_table[hash(<packet
> data>)%<indirection_table size>]
> > +
> > +
> > +Not for all packets, the hash can/should be calculated.
> > +
> > +Note: currently, eBPF RSS does not support hash reporting.
> > +
> > +eBPF RSS turned on by different combinations of vhost-net, vitrio-net
> and tap configurations:
> > +
> > +- eBPF is used:
> > +
> > +        tap,vhost=off & virtio-net-pci,rss=on,hash=off
> > +
> > +- eBPF is used:
> > +
> > +        tap,vhost=on & virtio-net-pci,rss=on,hash=off
> > +
> > +- 'in-qemu' RSS is used:
> > +
> > +        tap,vhost=off & virtio-net-pci,rss=on,hash=on
> > +
> > +- eBPF is used, hash population feature is not reported to the guest:
> > +
> > +        tap,vhost=on & virtio-net-pci,rss=on,hash=on
> > +
> > +If CONFIG_EBPF is not set then only 'in-qemu' RSS is supported.
> > +Also 'in-qemu' RSS, as a fallback, is used if the eBPF program failed
> to load or set to TUN.
> > +
> > +RSS eBPF program
> > +----------------
> > +
> > +RSS program located in ebpf/tun_rss_steering.h as an array of 'struct
> bpf_insn'.
> > +So the program is part of the qemu binary.
> > +Initially, the eBPF program was compiled by clang and source code
> located at ebpf/rss.bpf.c.
> > +Prerequisites to recompile the eBPF program (regenerate
> ebpf/tun_rss_steering.h):
> > +
> > +        llvm, clang, kernel source tree, python3 + (pip3 pyelftools)
> > +        Adjust 'linuxhdrs' in Makefile.ebpf to reflect the location of
> the kernel source tree
> > +
> > +        $ cd ebpf
> > +        $ make -f Makefile.ebpf
> > +
> > +Note the python script for convertation from eBPF ELF object to '.h'
> file - Ebpf_to_C.py:
> > +
> > +        $ python EbpfElf_to_C.py rss.bpf.o tun_rss_steering
> > +
> > +The first argument of the script is ELF object, second - section name
> where the eBPF program located.
> > +The script would generate <section name>.h file with eBPF instructions
> and 'relocate array'.
> > +'relocate array' is an array of 'struct fixup_mapfd_t' with the name of
> the eBPF map and instruction offset where the file descriptor of the map
> should be placed.
> > +
> > +Current eBPF RSS implementation uses 'bounded loops' with 'backward
> jump instructions' which present in the last kernels.
> > +Overall eBPF RSS works on kernels 5.8+.
>
>
> This reminds me that we probably need to probe this ability via
> configure script.
>
>
I'm not sure. One can boot with an older kernel, build qemu and run it with
a newer kernel, correct?



> Thanks
>
>
> > +
> > +eBPF RSS implementation
> > +-----------------------
> > +
> > +eBPF RSS loading functionality located in ebpf/ebpf_rss.c and
> ebpf/ebpf_rss.h.
> > +
> > +The `struct EBPFRSSContext` structure that holds 4 file descriptors:
> > +
> > +- ctx - pointer of the libbpf context.
> > +- program_fd - file descriptor of the eBPF RSS program.
> > +- map_configuration - file descriptor of the 'configuration' map. This
> map contains one element of 'struct EBPFRSSConfig'. This configuration
> determines eBPF program behavior.
> > +- map_toeplitz_key - file descriptor of the 'Toeplitz key' map. One
> element of the 40byte key prepared for the hashing algorithm.
> > +- map_indirections_table - 128 elements of queue indexes.
> > +
> > +`struct EBPFRSSConfig` fields:
> > +
> > +- redirect - "boolean" value, should the hash be calculated, on false
> - `default_queue` would be used as the final decision.
> > +- populate_hash - for now, not used. eBPF RSS doesn't support hash
> reporting.
> > +- hash_types - binary mask of different hash types. See
> `VIRTIO_NET_RSS_HASH_TYPE_*` defines. If for packet hash should not be
> calculated - `default_queue` would be used.
> > +- indirections_len - length of the indirections table, maximum 128.
> > +- default_queue - the queue index that used for packet that shouldn't
> be hashed. For some packets, the hash can't be calculated(g.e ARP).
> > +
> > +Functions:
> > +
> > +- `ebpf_rss_init()` - sets ctx to NULL, which indicates that
> EBPFRSSContext is not loaded.
> > +- `ebpf_rss_load()` - creates 3 maps and loads eBPF program from
> tun_rss_steering.h. Returns 'true' on success. After that, program_fd can
> be used to set steering for TAP.
> > +- `ebpf_rss_set_all()` - sets values for eBPF maps.
> `indirections_table` length is in EBPFRSSConfig. `toeplitz_key` is
> VIRTIO_NET_RSS_MAX_KEY_SIZE aka 40 bytes array.
> > +- `ebpf_rss_unload()` - close all file descriptors and set ctx to NULL.
> > +
> > +Simplified eBPF RSS workflow:
> > +
> > +.. code:: C
> > +
> > +    struct EBPFRSSConfig config;
> > +    config.redirect = 1;
> > +    config.hash_types = VIRTIO_NET_RSS_HASH_TYPE_UDPv4 |
> VIRTIO_NET_RSS_HASH_TYPE_TCPv4;
> > +    config.indirections_len = VIRTIO_NET_RSS_MAX_TABLE_LEN;
> > +    config.default_queue = 0;
> > +
> > +    uint16_t table[VIRTIO_NET_RSS_MAX_TABLE_LEN] = {...};
> > +    uint8_t key[VIRTIO_NET_RSS_MAX_KEY_SIZE] = {...};
> > +
> > +    struct EBPFRSSContext ctx;
> > +    ebpf_rss_init(&ctx);
> > +    ebpf_rss_load(&ctx);
> > +    ebpf_rss_set_all(&ctx, &config, table, key);
> > +    if (net_client->info->set_steering_ebpf != NULL) {
> > +        net_client->info->set_steering_ebpf(net_client,
> ctx->program_fd);
> > +    }
> > +    ...
> > +    ebpf_unload(&ctx);
> > +
> > +
> > +NetClientState SetSteeringEBPF()
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +For now, `set_steering_ebpf()` method supported by Linux TAP
> NetClientState. The method requires an eBPF program file descriptor as an
> argument.
>
>

Reply via email to