Re: [dpdk-dev] [dpdk-users] is i40evf support promisc? // DPDK 20.11 - i40evf: No response for 14

2021-11-04 Thread David Christensen




On 10/26/21 12:08 AM, liaobiting wrote:
Hi: Please help to see this DPDK problem. And I want to know whether 
i40e vf support promisc or not. Thanks a lot. From: liaobiting 
 Subject: Please help to see this DPDK 
problem//Reply: DPDK 20.11 - i40evf: No ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd

Hi:

 Please help to see this DPDK problem. And I want to know whether 
i40e vf support promisc or not. Thanks a lot.


According to the README at 
https://downloadmirror.intel.com/28381/eng/readme.txt you need to enable 
a VF as trusted for promiscuous mode.  Looks like this:


# ip link set dev eth0 vf 1 trust [on|off]

Did you enable the VF as trusted before assigning the NIC to the OVS switch?

Dave


Re: If or how one gets an IP address associated with a vfio-pci bound NIC

2021-11-04 Thread David Christensen
I'd appreciate one additional bit of information if possible. Once the 
DPDK NIC is bound to vfio-pci the DPDK Linux manual at
https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html#vfio 
 mentions 
setup steps including:


Create the desired number of VF devices
echo 2 > /sys/bus/pci/devices/:86:00.0/sriov_numvfs

My question: what is the upper bound on the number of VF devices? What's 
the thinking process? For example,

maybe one of these approaches makes sense?

- VF device count is bound from above by the number or RX/TX queues
- VF device count is bound from above by the amount of on-NIC memory
- VF device count is bound from above by manufacturer. Each NIC has some 
max; read specs
- VF device count is like the number of ports on a UNIX: 1000s are 
available and what you need depends on software: how many concurrent 
connections are needed?


Thu upper bound on Virtual Functions (VF) comes from the hardware 
itself.  It's advertised to the OS through the PCIe configuration 
register space.  You can use the lspci utility to discover this 
information.  For example, running "lspci | grep Ethernet" shows the 
NICs on my system:


:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
0003:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries 
NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0003:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries 
NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0003:01:00.2 Ethernet controller: Broadcom Inc. and subsidiaries 
NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0003:01:00.3 Ethernet controller: Broadcom Inc. and subsidiaries 
NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0005:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries 
NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries 
NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
0034:01:00.0 Ethernet controller: Intel Corporation Ethernet Controller 
XL710 for 40GbE QSFP+ (rev 02)
0034:01:00.1 Ethernet controller: Intel Corporation Ethernet Controller 
XL710 for 40GbE QSFP+ (rev 02)


Focusing on the Intel XL710 NIC, I can look at the SR-IOV capabilities 
values:


sudo lspci - -s 0034:01:00.0
0034:01:00.0 Ethernet controller: Intel Corporation Ethernet Controller 
XL710 for 40GbE QSFP+ (rev 02)

Subsystem: Intel Corporation Ethernet Converged Network Adapter XL710-Q2
...
Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
		Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency 
Link: 00

VF offset: 16, stride: 1, Device ID: 154c
Supported Page Size: 0553, System Page Size: 0010
Region 0: Memory at 00062240 (64-bit, prefetchable)
Region 3: Memory at 000622400100 (64-bit, prefetchable)
VF Migration: offset: , BIR: 0

The "Total VFs" value indicates how many VFs can be enabled for this NIC 
and indicates the upper bound you can use when enabling VFs with the 
echo command you mention above.  Other NICs may have different values 
depending on their individual hardware capabilities.



DPDK must have an API that programatically discovers the PFs and VFs per PF.


Support for SR-IOV is managed by the Linux kernel, not DPDK.  Once a VF 
is enabled under Linux, DPDK treats it just like a physical function 
(PF) NIC, assuming the poll-mode driver (PMD) written by the hardware 
manufacturer supports operating on the VF.


Finally: is a VF device duplex (sends and receives)? Or just RX or just 
TX only?


In my experience VFs support both send and receive.  There is also some 
Linux support for limiting bandwidth on VFs that support the capability 
(see "ip link set vf" on https://linux.die.net/man/8/ip).


Dave


Re: If or how one gets an IP address associated with a vfio-pci bound NIC

2021-11-03 Thread David Christensen




On 11/2/21 4:14 PM, fwefew 4t4tg wrote:
I'm trying to use DPDK on AWS i3.metal instances. I have the code built 
with AWS vfio-patches. In order to be logged into the machine on one NIC 
while having a free ENA NIC for DPDK, I attached a second NIC. 
./dpdk-devbind.py is able to ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd

I'm trying to use DPDK on AWS i3.metal instances. I have the code built 
with AWS vfio-patches. In order to be logged into the machine on one NIC 
while having a free ENA NIC for DPDK, I attached a second NIC.


./dpdk-devbind.py is able to see the second NIC, and bind to it. *All 
that's working fine. However, by default this 2nd NIC does not have an 
IP address.*


Meanwhile code needs a hostname or IP address of the client and server. 
How do I get an IP address associated with this 2nd NIC? 


I don't think you understand the intent behind the DPDK framework. 
You're passing control of the NIC to a user application.  That means you 
don't receive any benefits of the kernel's networking stack.  The user 
application you use will need to handle all network services, including 
it's own TCP/IP stack if required.


If you're using the bundled DPDK testpmd application then there's no 
need to assign an IP address to the interface.  The testpmd app can 
build and send/receive ANY type of network packet, though it's mostly 
only used to verify functionality provided by the DPDK framework.  If 
you're WRITING a network application then DPDK might be what you want, 
but if you have a specific network function in mind then you're likely 
looking for an application that USES DPDK.


And do I need
to do some sys-admin work to ensure traffic in and out of the DPDK bound 
vfio-pci NIC is kept separate from the first NIC?


As far as I can see the correct approach is to:

# setup second NIC to have an IP address and make sure UP before 
dpdk-devbind:

* sudo ip addr add  dev ens1 label ens1:1
* sudo ip lin set ens1 u[

before I do DPDK bind.

The NIC, when AWS adds it, starts off down without an IP address by default:

ubuntu$ lspci | grep Ether

04:00.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
05:00.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
ubuntu$ sudo ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN 
group default qlen 1000

     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1/8  scope host lo
        valid_lft forever preferred_lft forever
     inet6 ::1/128 scope host
        valid_lft forever preferred_lft forever
2: ens785:  mtu 9001 qdisc mq state UP 
group default qlen 1000

     link/ether 0a:0f:1f:db:ca:73 brd ff:ff:ff:ff:ff:ff
     inet 172.31.17.144/20  brd 172.31.31.255 
scope global dynamic ens785

        valid_lft 3544sec preferred_lft 3544sec
     inet6 fe80::80f:1fff:fedb:ca73/64 scope link
        valid_lft forever preferred_lft forever
*3: ens1:  mtu 1500 qdisc noop state DOWN group 
default qlen 1000

     link/ether 0a:06:15:14:95:05 brd ff:ff:ff:ff:ff:ff
*
Once I bind 'ens1' dpdk-devbind reports it as bound -AND- it no longer 
appears in `ip a`:


This is expected.  You've removed the NIC from the kernel's control and 
bound it to the vfio_pci driver, which allows the NIC to be controlled 
entirely by a user application.




Network devices using DPDK-compatible driver

:05:00.0 'Elastic Network Adapter (ENA) ec20' drv=vfio-pci unused=ena

Network devices using kernel driver
===
:04:00.0 'Elastic Network Adapter (ENA) ec20' if=ens785 drv=ena 
unused=vfio-pci *Active*


$ ip a
ubuntu@ip-172-31-17-144:~/Scripts$ ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN 
group default qlen 1000

     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1/8  scope host lo
        valid_lft forever preferred_lft forever
     inet6 ::1/128 scope host
        valid_lft forever preferred_lft forever
2: ens785:  mtu 9001 qdisc mq state UP 
group default qlen 1000

     link/ether 0a:0f:1f:db:ca:73 brd ff:ff:ff:ff:ff:ff
     inet 172.31.17.144/20  brd 172.31.31.255 
scope global dynamic ens785

        valid_lft 3314sec preferred_lft 3314sec
     inet6 fe80::80f:1fff:fedb:ca73/64 scope link
        valid_lft forever preferred_lft forever


Everything seems in order here.  If you can share what you're trying to 
accomplish with DPDK we might be able to provide better guidance.


Dave


[dpdk-users] Connecting VM to example/vhost Application Fails with "Failed to unlink"

2021-03-03 Thread David Christensen
Trying to connect a VM to the example/vhost switch application.  The 
dpdk-vhost command line and output as follows:


$ sudo ./build/examples/dpdk-vhost -l 120-127 -n 4 --socket-mem 
0,0,0,0,0,0,0,0,2048 --no-pci  -- -p 1 --socket-file /tmp/sock0

EAL: Detected 160 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: No legacy callbacks, legacy socket not created
VHOST_PORT:
Specified port number(1) exceeds total system port number(0)
VHOST_DATA: Procesing on Core 121 started
VHOST_DATA: Procesing on Core 122 started
VHOST_DATA: Procesing on Core 126 started
VHOST_DATA: Procesing on Core 125 started
VHOST_DATA: Procesing on Core 123 started
VHOST_DATA: Procesing on Core 127 started
VHOST_CONFIG: vhost-user server: socket created, fd: 48
VHOST_DATA: Procesing on Core 124 started
VHOST_CONFIG: bind to /tmp/sock0

VM configuration in the libvirt XML as follows:


  
  
  
  


  
  function='0x0'/>



VM fails to start with the following error:

$ sudo virsh start --console dpdk-server-0
error: Failed to start domain dpdk-server-0
error: internal error: process exited while connecting to monitor: 
2021-03-03T22:16:20.767940Z qemu-system-ppc64: -chardev 
socket,id=charnet0,path=/tmp/sock0,server: Failed to unlink socket 
/tmp/sock0: Operation not permitted


I'm able to start the VM with OVS/DPDK (2.15.0/20.11) and testpmd from 
DPDK, but not the vhost switch application.  Any suggestions?


Dave


Re: [dpdk-users] Fwd: use VF in promiscuous mode in dpdk to receive all traffic received by PF

2021-01-14 Thread David Christensen




On 1/13/21 9:59 PM, Myth Ren wrote:

Hi, all
 is it possible we are mirror traffic from switch/router to PF, then a
program based on DPDK
read packet from a VF related to the PF, presume we are mirror the traffic
from PF to the VF (promiscuous),
while the PF device still managed by the kernel driver(if it's possible),
or managed by UIO driver.
at least, restart the DPDK based program will not influence the PF device
state.


You should be specific about the NIC/PMD you're using here.  One use 
case for PF/VF is that the PF will run in the host operating system and 
the VF will be passed-through to a virtual machine.  Because of this, 
NICs will often support a feature to actively prevent a VF from snooping 
on the traffic seen by the PF for security, limiting it to only 
broadcast/multicast traffic and unicast traffic with a matching MAC 
address.  The ability for a VF to enable promiscuous mode and see PF 
traffic, if it is supported on the NIC, is likely to be NIC/PMD specific 
implementation.


And why the need for a VF in the first place? If the switch/router is 
external to the box and forwarding traffic into the PF, why not just 
catch the traffic from the PF?  You'll need a second PF anyway to send 
the traffic onward (like a bump-in-the-wire implementation), unless 
you're mangling the packets for some reason to change their address, 
adding a VLAN or other tag, etc.  A drawing of what you're trying to 
implement would be useful here.


Dave


Re: [dpdk-users] DPDK 20.05 MLX5 'no Verbs device matches PCI device , are kernel drivers loaded'

2020-10-26 Thread David Christensen

I am trying to test the mlx5 performance on ubuntu 16.04, and when I run the 
testpmd with mlx nic,

Some error happened.




EAL: Probe PCI driver: net_mlx5 (15b3:1018) device: :af:02.3 (socket 1)


This PCI device ID, 15b3:1018, represents a virtual function (VF) 
according to https://pci-ids.ucw.cz/read/PC/15b3/1018.  Why are you 
using a VF in this situation?  Are you running inside a VM or on a bare 
metal host?




net_mlx5: mlx5.c:3322: mlx5_pci_probe(): no Verbs device matches PCI device 
:af:02.3, are kernel drivers loaded?

EAL: Requested device :af:02.3 cannot be used

EAL: Probe PCI driver: net_mlx5 (15b3:1018) device: :af:02.4 (socket 1)

net_mlx5: mlx5.c:3322: mlx5_pci_probe(): no Verbs device matches PCI device 
:af:02.4, are kernel drivers loaded?

EAL: Requested device :af:02.4 cannot be used

EAL: Bus (pci) probe failed.


The code in mlx5_os_pci_probe() suggests that the DPDK mlx5 driver 
didn't find any Infiniband devices on your system.  Is your adapter 
enabled for Infiniband on the host system?  What does "lspci | grep 
Mellanox" return?  What about "ibv_devinfo"?


Dave


Re: [dpdk-users] rte_eth_stats counters

2020-09-21 Thread David Christensen

On 9/16/20 8:42 PM, Gerry Wan wrote:

Hi,

I'm testing out the maximum RX throughput my application can handle and am
using rte_eth_stats_get() to measure the point at which RX packets start
getting dropped. I have a traffic generator directly connected to my RX
port.

I've noticed that imissed is the number of packets that are dropped due to
hardware queue fulls, while ipackets is the number of packets successfully
received and agrees with the total number of packets retrieved from calls
to rte_eth_rx_burst(). I'm not sure exactly what ierrors is supposed to
count, but so far I have not seen this go beyond 0?

I have been interpreting the sum of (ipackets + imissed + ierrors) = itotal
as the total number of packets hitting the port. However, I've noticed that
when throughput gets too high, imissed will remain 0 while itotal is
smaller than the number of packets sent by the traffic generator. I've
ruled out connection issues because increasing the number of RSS queues
seems to fix the problem (up to a certain threshold before itotal again
becomes smaller than the number sent), but I don't understand why. If it is
not dropped in HW because the queues are full (since imissed = 0), where
are the packets being dropped and is there a way I can count these?

I am using DPDK 20.08 with a Mellanox CX-5, RSS queue size = 4096


When app buffers fill up then the HW buffers start to fill up.  When HW 
buffers are full then the PHY responds by generating flow control frames 
or simply dropping packets.  You could experiment by enabling/disabling 
flow control to verify that the packet counts are correct when flow 
control is enabled.


You could also look at the rx_discards_phy counter and contrast it with 
the rx_out_of_buffer statistic:


https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters

My read is that rx_out_of_buffer indicates that the HW doesn't have any 
RX descriptors available, possibly because of PCIe congestion or because 
the app's receive queue is empty.  On the other hand, rx_discards_phy 
indicates that the HW buffers are full.  I don't see the rx_discards_phy 
used in any stats, only available as an xstat.


Dave


Re: [dpdk-users] Problem with dpdk vf pollmode driver - Ethernet controller: Intel Corporation XL710/X710 Virtual Function

2020-09-03 Thread David Christensen

On 9/2/20 10:26 PM, Venumadhav Josyula wrote:

We have SR-IOV, we have few virtual functions mapped to an vm. We have
seeing link issues, after we started dpdk based application we are seeing
following


Ensure that VF link state is set appropriately.  The following "ip" 
command allows you to control whether the VF link follows the physical 
function link state, is always up, or is forced down (your system might 
be configured for the "down" state):


ip link set  vf  state auto|enable|disable

Try enabling "auto" or "enable" on the host and see if that helps.

Dave


Re: [dpdk-users] DPDK application fails to start in KVM

2020-08-25 Thread David Christensen

On 8/25/20 10:02 AM, Jatin Sahu wrote:

Error details:
ERROR: This system does not support "RDRAND".
Please check that RTE_MACHINE is set correctly.
EAL: FATAL: unsupported cpu type.
EAL: unsupported cpu type.


RDRAND is an x86 CPU instruction (https://en.wikipedia.org/wiki/RDRAND).
Your problem is likely related to the CPU type you selected for your VM. 
Try running "lscpu | grep rdrand" in your VM.  On my VM with "Common KVM 
processor" the RDRAND flag is not available.


Dave


[dpdk-users] Vhost PMD Performance Doesn't Scale as Expected

2020-08-12 Thread David Christensen
I'm examining performance between two VMs connected with a vhost 
interface on DPDK 20.08 and testpmd.  Each VM (client-0, server-0) has 4 
VCPUs, 4 RX/TX queues per port, 4GB RAM, and runs 8 containers, each 
with an instance of qperf running the tcp_bw test.  The configuration is 
targeting all CPU/memory activity for NUMA node 1.


When I look at the cumulative throughput as I increase the number of 
qperf pairs I'm noticing that the performance doesn't appear to scale as 
I had hoped.  Here's a table with some results:


concurrent qperf pairs
msg_size 1   2   4   8
8,19212.74 Gb/s  21.68 Gb/s  27.89 Gb/s  30.94 Gb/s
16,384   13.84 Gb/s  24.06 Gb/s  28.51 Gb/s  30.47 Gb/s
32,768   16.13 Gb/s  24.49 Gb/s  28.89 Gb/s  30.23 Gb/s
65,536   16.19 Gb/s  22.53 Gb/s  29.79 Gb/s  30.46 Gb/s
131,072  15.37 Gb/s  23.89 Gb/s  29.65 Gb/s  30.88 Gb/s
262,144  14.73 Gb/s  22.97 Gb/s  29.54 Gb/s  31.28 Gb/s
524,288  14.62 Gb/s  23.39 Gb/s  28.70 Gb/s  30.98 Gb/s

Can anyone suggest a possible configuration change that might improve 
performance or is this generally what is expected?  I was expecting 
performance to nearly double as I move from 1 to 2 to 4 queues.


Even single queue performance is below Intel's published performance 
results (see 
https://fast.dpdk.org/doc/perf/DPDK_20_05_Intel_virtio_performance_report.pdf), 
though I was unable to get the vhost-switch example application to run 
due to an mbuf allocation error for the i40e PMD and had to revert to 
the testpmd app.


Configuration details below.

Dave

/proc/cmdline:
--
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-147.el8.x86_64 
root=/dev/mapper/rhel-root ro intel_iommu=on iommu=pt 
default_hugepagesz=1G hugepagesz=1G hugepages=64 crashkernel=auto 
resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap 
rhgb quiet =1 nohz=on nohz_full=8-15,24-31 rcu_nocbs=8-15,24-31 
tuned.non_isolcpus=00ff00ff intel_pstate=disable nosoftlockup


testpmd command-line:
-
~/src/dpdk/build/app/dpdk-testpmd -l 7,24-31 -n 4 --no-pci --vdev 
'net_vhost0,iface=/tmp/vhost-dpdk-server-0,dequeue-zero-copy=1,tso=1,queues=4' 
--vdev 
'net_vhost1,iface=/tmp/vhost-dpdk-client-0,dequeue-zero-copy=1,tso=1,queues=4' 
 -- -i --nb-cores=8 --numa --rxq=4 --txq=4


testpmd forwarding core mapping:

Start automatic packet forwarding
io packet forwarding - ports=2 - cores=8 - streams=8 - NUMA support 
enabled, MP allocation mode: native

Logical Core 24 (socket 1) forwards packets on 1 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
Logical Core 25 (socket 1) forwards packets on 1 streams:
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
Logical Core 26 (socket 1) forwards packets on 1 streams:
  RX P=0/Q=1 (socket 0) -> TX P=1/Q=1 (socket 0) peer=02:00:00:00:00:01
Logical Core 27 (socket 1) forwards packets on 1 streams:
  RX P=1/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00
Logical Core 28 (socket 1) forwards packets on 1 streams:
  RX P=0/Q=2 (socket 0) -> TX P=1/Q=2 (socket 0) peer=02:00:00:00:00:01
Logical Core 29 (socket 1) forwards packets on 1 streams:
  RX P=1/Q=2 (socket 0) -> TX P=0/Q=2 (socket 0) peer=02:00:00:00:00:00
Logical Core 30 (socket 1) forwards packets on 1 streams:
  RX P=0/Q=3 (socket 0) -> TX P=1/Q=3 (socket 0) peer=02:00:00:00:00:01
Logical Core 31 (socket 1) forwards packets on 1 streams:
  RX P=1/Q=3 (socket 0) -> TX P=0/Q=3 (socket 0) peer=02:00:00:00:00:00

  io packet forwarding packets/burst=32
  nb forwarding cores=8 - nb forwarding ports=2
  port 0: RX queue number: 4 Tx queue number: 4
Rx offloads=0x0 Tx offloads=0x0
RX queue: 0
  RX desc=0 - RX free threshold=0
  RX threshold registers: pthresh=0 hthresh=0  wthresh=0
  RX Offloads=0x0
TX queue: 0
  TX desc=0 - TX free threshold=0
  TX threshold registers: pthresh=0 hthresh=0  wthresh=0
  TX offloads=0x0 - TX RS bit threshold=0
  port 1: RX queue number: 4 Tx queue number: 4
Rx offloads=0x0 Tx offloads=0x0
RX queue: 0
  RX desc=0 - RX free threshold=0
  RX threshold registers: pthresh=0 hthresh=0  wthresh=0
  RX Offloads=0x0
TX queue: 0
  TX desc=0 - TX free threshold=0
  TX threshold registers: pthresh=0 hthresh=0  wthresh=0
  TX offloads=0x0 - TX RS bit threshold=0

lscpu:
--
Architecture:x86_64
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Little Endian
CPU(s):  32
On-line CPU(s) list: 0-31
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):   2
NUMA node(s):2
Vendor ID:   GenuineIntel
CPU family:  6
Model:   85
Model name:  Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
Stepping:4
CPU MHz: 2400.075
BogoMIPS:4200.00
Virtualization:  VT-x
L1d cache:   32K
L1i cache:   32K
L2 cache:

Re: [dpdk-users] Mellanox CX-5 failed with DPDK 20.05, but ok with DPDK 19.11.3

2020-07-13 Thread David Christensen

Dear DPDK Users,

I met a problem with DPDK 20.05 + Mellanox CX-5 NIC. Does anyone know how to 
fix it? Thanks a lot!

Problem:
Mellanox CX-5 card ok with DPDK 19.11.3 but failed with DPDK 20.05.
20.05 Reports no ethernet device, ok with 19.11.3


The Mellanox CX-5 poll mode driver (PMD) is not built by default in some 
configurations.  When using meson to build the framework the mlx5 
dependencies are usually detected automatically and the PMD is built 
correctly.  On the other hand, if you're building with GNU make then the 
MLX5 PMD needs to be specifically enabled by modifying a configuration 
file.  Refer to the MLX5 PMD documentation for more details:


https://doc.dpdk.org/guides/nics/mlx5.html

Dave



Re: [dpdk-users] Poor performance when using OVS with DPDK

2020-06-26 Thread David Christensen

 > Why don't you reserve any CPUs for OVS/DPDK or VM usage?  All
 > published
 > performance white papers recommend settings for CPU isolation like
 > this
 > Mellanox DPDK performance report:
 >
 > 
https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf 


 >
 > For their test system:
 >
 > isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0
 > intel_pstate=disable nohz_full=24-47
 > rcu_nocbs=24-47 rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G
 > hugepages=64 audit=0
 > nosoftlockup
 >
 > Using the tuned service (CPU partitioning profile) make this process
 > easier:
 >
 > https://tuned-project.org/ 


 >
Nice tutorial, thanks for sharing. I have checked it and configured our
server like this:

isolcpus=12-19 intel_idle.max_cstate=0 processor.max_cstate=0
nohz_full=12-19 rcu_nocbs=12-19 intel_pstate=disable
default_hugepagesz=1G hugepagesz=1G hugepages=24 audit=0 nosoftlockup
intel_iommu=on iommu=pt rcu_nocb_poll


Even though our servers are NUMA-capable and NUMA-aware, we only have
one CPU installed in one socket.
And one CPU has 20 physical cores (40 threads), so I figured out to use
the "top-most" cores for DPDK/OVS, that's the reason of isolcpus=12-19


You can never have too many cores.  On POWER systems I'll sometimes 
reserve 76 out of 80 available cores to improve overall throughput.



 > >
 > > ./usertools/dpdk-devbind.py --status
 > > Network devices using kernel driver
 > > ===
 > > :b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2
 > drv=mlx5_core
 > > unused=igb_uio,vfio-pci
 > >
 > > Due to the way how Mellanox cards and their driver work, I have not
 > bond
 > > igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel
 > modules
 > > are loaded.
 > >
 > >
 > > Relevant part of the VM-config for Qemu/KVM
 > > ---
 > >    
 > >      4096
 > >      
 > >      
 >
 > Where did you get these CPU mapping values?  x86 systems typically
 > map
 > even-numbered CPUs to one NUMA node and odd-numbered CPUs to a
 > different
 > NUMA node.  You generally want to select CPUs from the same NUMA node
 > as
 > the mlx5 NIC you're using for DPDK.
 >
 > You should have at least 4 CPUs in the VM, selected according to the
 > NUMA topology of the system.
as per my answer above, our system has no secondary NUMA node, all
mappings are to the same socket/CPU.

 >
 > Take a look at this bash script written for Red Hat:
 >
 > 
https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansible/get_cpulist.sh 


 >
 > It gives you a good starting reference which CPUs to select for the
 > OVS/DPDK and VM configurations on your particular system.  Also
 > review
 > the Ansible script pvp_ovsdpdk.yml, it provides a lot of other
 > useful
 > steps you might be able to apply to your Debian OS.
 >
 > >      
 > >    
 > >    
 > >      
 > >      
 > >      
 > >         > memAccess='shared'/>
 > >      
 > >    
 > >      
 > >        
 > >         path='/usr/local/var/run/openvswitch/vhostuser'
 > > mo$
 > >        
 > >        
 > >          
 >
 > Is there a requirement for mergeable RX buffers?  Some PMDs like
 > mlx5
 > can take advantage of SSE instructions when this is disabled,
 > yielding
 > better performance.
Good point, there is no requirement, I just took an example config and
though it's necessary for the driver queues setting.


That's how we all learn :-)


 >
 > >        
 > >         > function='0x0'$
 > >      
 > >
 >
 > I don't see hugepage usage in the libvirt XML.  Something similar to:
 >
 >    8388608
 >    8388608
 >    
 >      
 >        
 >      
 >    
I did not copy this part of the XML, but we have hugepages configured
properly.
 >
 >
 > > ---
 > > OVS Start Config
 > > ---
 > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
 > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
 > mem="4096,0"
 > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
 > mask=0xff
 > > ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e
 >
 > These two masks shouldn't overlap:
 > 
https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-dealing-with-multi-numa/ 

Re: [dpdk-users] Poor performance when using OVS with DPDK

2020-06-25 Thread David Christensen




On 6/24/20 4:03 AM, Vipul Ujawane wrote:

Dear all,

I am observing a very low performance when running OVS-DPDK when compared
to OVS running with the Kernel Datapath.
I have OvS version 2.13.90 compiled from source with the latest stable DPDK
v19.11.3 on a stable Debian system running kernel 4.19.0-9-amd64 (real
version:4.19.118).

I have tried to use the latest released OvS as well (2.12) with the same
LTS DPDK. As a last resort, I have tried an older kernel, whether it has
any problem (4.19.0-8-amd64 (real version:4.19.98)).

I have not been able to troubleshoot the problem, and kindly request your
help regarding the same.

HW configuration

We have to two totally identical servers (Debian stable, Intel(R) Xeon(R)
Gold 6230 CPU, 96G Mem), each runs KVM virtual machine. On the hypervisor
layer, we have OvS for traffic routing. The servers are connected directly
via a Mellanox ConnectX-5 (1x100G).
OVS Forwarding tables are configured for simple port-forwarding only to
avoid any packet processing-related issue.

Problem
===
When both servers are running OVS-Kernel at the hypervisor layer and VMs
are connected to it via libvirt and virtio interfaces, the
VM->Server1->Server2->VM throughput is around 16-18Gbps.
However, when using OVS-DPDK with the same setting, the throughput drops
down to 4-6Gbps.


You don't mention the traffic profile.  I assume 64 byte frames but best 
to be explicit.




SW/driver configurations:
==
DPDK

In config common_base, besides the defaults, I have enabled the following
extra drivers/features to be compiled/enabled.
CONFIG_RTE_LIBRTE_MLX5_PMD=y
CONFIG_RTE_LIBRTE_VHOST=y
CONFIG_RTE_LIBRTE_VHOST_NUMA=y
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_VIRTIO_USER=n
CONFIG_RTE_EAL_VFIO=y


OVS
---
$ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.13.90

$sudo ovs-vsctl get Open_vSwitch . dpdk_initialized
true

$sudo ovs-vsctl get Open_vSwitch . dpdk_version
"DPDK 19.11.3"

OS settings
---
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster


$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
quiet


Why don't you reserve any CPUs for OVS/DPDK or VM usage?  All published 
performance white papers recommend settings for CPU isolation like this 
Mellanox DPDK performance report:


https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf

For their test system:

isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0 
intel_pstate=disable nohz_full=24-47
rcu_nocbs=24-47 rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G 
hugepages=64 audit=0

nosoftlockup

Using the tuned service (CPU partitioning profile) make this process easier:

https://tuned-project.org/



./usertools/dpdk-devbind.py --status
Network devices using kernel driver
===
:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2 drv=mlx5_core
unused=igb_uio,vfio-pci

Due to the way how Mellanox cards and their driver work, I have not bond
igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel modules
are loaded.


Relevant part of the VM-config for Qemu/KVM
---
   
 4096
 
 


Where did you get these CPU mapping values?  x86 systems typically map 
even-numbered CPUs to one NUMA node and odd-numbered CPUs to a different 
NUMA node.  You generally want to select CPUs from the same NUMA node as 
the mlx5 NIC you're using for DPDK.


You should have at least 4 CPUs in the VM, selected according to the 
NUMA topology of the system.


Take a look at this bash script written for Red Hat:

https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansible/get_cpulist.sh

It gives you a good starting reference which CPUs to select for the 
OVS/DPDK and VM configurations on your particular system.  Also review 
the Ansible script pvp_ovsdpdk.yml, it provides a lot of other useful 
steps you might be able to apply to your Debian OS.



 
   
   
 
 
 
   
 
   
 
   
   
   
 


Is there a requirement for mergeable RX buffers?  Some PMDs like mlx5 
can take advantage of SSE instructions when this is disabled, yielding 
better performance.



   
   



I don't see hugepage usage in the libvirt XML.  Something similar to:

  8388608
  8388608
  

  

  



---
OVS Start Config
---
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096,0"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xff
ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e


These two masks shouldn't overlap:

Re: [dpdk-users] Assistance Request: Error adding DPDK VFIO NIC to OVS+DPDK Bridge

2020-05-07 Thread David Christensen




On 5/4/20 2:18 PM, Wittling, Mark (CCI-Atlanta) wrote:

Greetings, I cannot get OpenVSwitch to add my DPDK-bound VFIO port (e1000 
DPDK-compatible NIC) to the bridge without an error.

The error is at the end, and I am supplying all of the info I know the 
community would typically ask me for, before I show the error at the bottom.

Any help would be greatly appreciated.

...

# cat ovs-vswitchd.log
2020-05-04T21:12:11.071Z|00291|dpdk|ERR|EAL: Driver cannot attach the device 
(:01:00.0)


What do you see with the following bash command:

for a in /sys/kernel/iommu_groups/*; do find $a -type l; done | sort 
--version-sort


There may be other devices in the same IOMMU group as your NICs.  If so, 
then you need to bind VFIO to those devices as well.  Found this link 
that might be helpful if you're unfamiliar with the concept of IOMMU groups:


https://heiko-sieger.info/iommu-groups-what-you-need-to-consider/

Dave


Re: [dpdk-users] How to configure an Ethernet Driver to ignore the Ethertype

2020-02-26 Thread David Christensen

I was wondering if there is a way to get the Ethernet driver to ignore the
Ethertype. The problem I am having is that I am dealing with a switch chip
that is redirecting packets to the processor via an offload ethernet
device. Before offload the switch chip adds a header that pushes the DA
into the spot where the EtherType is normally. As a result, the ethertype
is essentially a random value so some packets will look like that have a
snap header and the driver will conclude that the packet is corrupted I
presume. I am working with the IXGBE driver.


Are you talking about HiGig/HiGig2 headers? I think your support in the 
DPDK will be limited to octeontx2 (look for switch_header="higig2" in

https://fast.dpdk.org/doc/pdf-guides/nics-master.pdf).

Dave


Re: [dpdk-users] Using devbind.py for automated testing

2020-01-30 Thread David Christensen

My questions are:
- Has anyone developed similar functionality? if yes, please share its details.
- Any better way to achieve the same functionality?


Have you seen the driverctl application?
https://gitlab.com/driverctl/driverctl

I've found it useful for scripting with Ansible.

# driverctl list-devices
# driverctl list-overrides
# driverctl -v set-override :00:01.0 vfio
# driverctl -v unset-override :00:01.0

Dave


[dpdk-users] Incorrect Link State with DPDK 18.11.1 and vhost PMD

2019-05-07 Thread David Christensen
I'm attempting to send traffic between host and guest using testpmd. 
Using the vhost PMD on the host side and the virtio PMD with vhost 
backend on the guest side.


I can successfully send traffic from the host to the guest (i.e. host 
has --forward-mode=txonly and guest has --forward-mode=rxonly)but I'm 
unable to send traffic in the opposite direction.


After further investigation I discovered that the host ports show link 
UP while the guest ports show link DOWN.  So the behavior makes sense, 
only the ports where link is UP are able to send traffic, but I don't 
understand why that might be the case.  I've also discovered that the 
testpmd commands for changing link state don't work (i.e. the command 
"set link-up port 0" fails in the guest and the command "set link-down 
port 0" fails in the host).


Has anyone encountered this situation or have experience in running this 
particular configuration?  My testpmd command line parameters for both 
are shown below, along with the qemu parameters used by the guest.


Dave

Host:
$ sudo 
/home/dave/src/p9-dpdk-perf/dpdk/ppc_64-power8-linuxapp-gcc/app/testpmd 
--vdev 'net_vhost1,queues=4,iface=/tmp/vhost-sock1' --vdev 
'net_vhost2,queues=4,iface=/tmp/vhost-sock2' -b :01:00.0 -b 
:01:00.1 -l 8,16-23 -n 4 -- --portmask=0x3 --rxq=4 --rxd=1024 
--txq=4 --txd=4096 --nb-cores=8 -i --numa --forward-mode=rxonly


Guest:
/home/dave/src/p9-dpdk-perf/dpdk/ppc_64-power8-linuxapp-gcc/app/testpmd 
-w 00:08.0 -w 00:09.0 -l 4,8-15 -n 4 -- --rxq=4 --rxd=1024 --txq=4 
--txd=4096 --nb-cores=8 -i --forward-mode=txonly


Qemu:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin 
QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name 
guest=rhel7.6-alt,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-22-rhel7.6-alt/master-key.aes 
-machine pseries-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu 
host -m 32768 -realtime mlock=off -smp 24,sockets=1,cores=24,threads=1 
-object 
memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/22-rhel7.6-alt,share=yes,size=34359738368 
-numa node,nodeid=0,cpus=0-23,memdev=ram-node0 -uuid 
081f2381-716d-40b9-8b48-2c05626c9f54 -no-user-config -nodefaults 
-chardev socket,id=charmonitor,fd=24,server,nowait -mon 
chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot menu=on,strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive 
file=/home/davec/images/rhel7.6.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=writeback 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2,write-cache=on 
-drive if=none,id=drive-scsi0-0-0-0,readonly=on -device 
scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 
-netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:8f:e8:8b,bus=pci.0,addr=0x1 
-chardev socket,id=charnet1,path=/tmp/vhost-sock1 -netdev 
vhost-user,chardev=charnet1,queues=4,id=hostnet1 -device 
virtio-net-pci,mq=on,vectors=10,rx_queue_size=256,netdev=hostnet1,id=net1,mac=52:54:00:2a:9f:eb,bus=pci.0,addr=0x8 
-chardev socket,id=charnet2,path=/tmp/vhost-sock2 -netdev 
vhost-user,chardev=charnet2,queues=4,id=hostnet2 -device 
virtio-net-pci,mq=on,vectors=10,rx_queue_size=256,netdev=hostnet2,id=net2,mac=52:54:00:4c:83:a2,bus=pci.0,addr=0x9 
-chardev pty,id=charserial0 -device 
spapr-vty,chardev=charserial0,id=serial0,reg=0x3000 -chardev 
socket,id=charchannel0,fd=28,server,nowait -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 
-device usb-kbd,id=input0,bus=usb.0,port=1 -device 
usb-mouse,id=input1,bus=usb.0,port=2 -vnc 127.0.0.1:0 -device 
VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x7 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny 
-msg timestamp=on