Re: Compiling testpmd with DPDK netvsc PMD

2024-06-12 Thread Stephen Hemminger
On Fri, 7 Jun 2024 16:31:51 -0700
Nandini Rangaswamy  wrote:

> Hi David,
> Thanks for your email. I inspected meson build output and do see that
> netvsc is in the list of enabled drivers.
> ===
> Drivers Enabled
> ===
> 
> common:
> iavf, mlx5, qat,
> bus:
> auxiliary, pci, vdev, vmbus,
> mempool:
> bucket, ring, stack,
> dma:
> 
> net:
> af_packet, bond, e1000, ena, failsafe, gve, i40e, iavf,
> ice, igc, ixgbe, kni, mlx5, *netvsc*, ring, tap,
> vdev_netvsc, vhost, virtio, vmxnet3,
> 
> Also, i changed the meson.build default_library=shared from static and it
> worked.
> Regards,
> Nandini
> 
> On Fri, Jun 7, 2024 at 3:56 AM David Marchand 
> wrote:
> 
> > Hello,
> >
> > On Thu, Jun 6, 2024 at 11:32 PM Nandini Rangaswamy
> >  wrote:  
> > > I tried compiling the testpmd with DPDK netvsc for openwrt by setting  
> > CONFIG_RTE_LIBRTE_NETVSC_PMD=y .  
> > >
> > > However, when I check ldd testpmd, it does not show any of the dpdk  
> > shared libraries including net_netvsc linked to testpmd binary.  

Testpmd is a special case. It always is statically linked because it has to
have several drivers private API's.  So using ldd to check is not going to give
the right answer.

What is the startup of testpmd look like? You may need to enable debugging of
vmbus and netvsc to see all the reasons the driver decided not to be probed.


Re: Compiling testpmd with DPDK netvsc PMD

2024-06-11 Thread Stephen Hemminger
On Mon, 10 Jun 2024 09:50:42 +0200
David Marchand  wrote:

> Hello,
> 
> On Sat, Jun 8, 2024 at 1:32 AM Nandini Rangaswamy
>  wrote:
> > Thanks for your email. I inspected meson build output and do see that 
> > netvsc is in the list of enabled drivers.
> > ===
> > Drivers Enabled
> > ===
> >
> > common:
> > iavf, mlx5, qat,
> > bus:
> > auxiliary, pci, vdev, vmbus,
> > mempool:
> > bucket, ring, stack,
> > dma:
> >
> > net:
> > af_packet, bond, e1000, ena, failsafe, gve, i40e, iavf,
> > ice, igc, ixgbe, kni, mlx5, netvsc, ring, tap,
> > vdev_netvsc, vhost, virtio, vmxnet3,  
> 
> Ok, so the driver seems indeed part of this build, yet it was not
> functional at runtime?
> Could you confirm this driver was indeed embeeded in (*statically*
> linked) testpmd?
> $ ./usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | grep -i vsc
> "name": "net_netvsc",
> "name": "net_vdev_netvsc",
> 
> 
> >
> > Also, i changed the meson.build default_library=shared from static and it 
> > worked.  
> 
> Mm, the fact that changing link mode fixes the issue points at a link issue.
> 
> There is a bug with old pkg-config tool (<= 0.27 iirc) that does not
> process correctly dpdk .pc (for static link).
> It is worth checking which version of pkgconf is used in openwrt.
> 
> 

Does the openwrt kernel include the uio_hv_generic driver?
Did you bind the network device to uio_hv_generic as described in the
documentation: https://doc.dpdk.org/guides/nics/netvsc.html



Re: Reg DPDK Support in Ubuntu - 24.04 LTS

2024-05-07 Thread Stephen Hemminger
On Tue, 7 May 2024 12:47:10 +0530
Harrish SJ  wrote:

> Hi DPDK Community,
> 
> We understand that latest DPDK release 24.03 is qualified in Ubuntu OS - 
> 22.04.4 LTS. Ref : https://doc.dpdk.org/guides/rel_notes/release_24_03.html 
> We would like to understand when latest Ubuntu - 24.04 LTS will be qualified 
> in upcoming DPDK release.?
> Do we have any known restriction on supporting recent DPDK releases on latest 
> Ubuntu OS LTS versions.?
> Thanks in advance,
> 
> Regards and Thanks,
> Harrish.S.J
> 

Support is up to Ubuntu and the maintainer not the DPDK community.


Re: High packet capturing rate in DPDK enabled port

2024-05-05 Thread Stephen Hemminger
On Sun, 5 May 2024 13:09:42 +0600
Fuji Nafiul  wrote:

> I have a DPDK-enabled port (Linux server) that serves around 5,000-50,000
> concurrent calls, per packet size of 80 bytes to 200 bytes. so in peak
> time, I require packet capture + file writing speed of around 1GByte/s or 8
> Gbit/sec (at least 0.5Gbyte/s is expected). dpdk official packet capture
> example project "dpdk-dumpcap"'s documentation says it has a capability of
> around 10MByte/s which is far less than required. I implemented a simple
> packet capture and pcap writing code which was able to dump
> around 5000-7000 concurrent call data where I used 1 core and 1 single ring
> of size 4096, and this was all integrated into actual media code (didn't
> use librte_pdump, simply copied to separate rte_ring after capturing
> through rte_eth_rx_burst() and before sending through rte_eth_tx_burst() ).
> I know I can launch this multiple cores and with multiple rings and so on
> but is there any current project which already does this?
> 
> I found a third-party project named "dpdkcap" which says it can support up
> to 10Gbit/s. Has anyone used it and what's the review?
> 
> Or, should I modify the "dpdk-dumpcap" project to my need to
> implement multicore and multi-ring support so I can extend the capability?
> Thanks in advance

The limitations of high speed packet capture is more about speed of writing
to disk. Doing single write per packet is part of the problem. Getting higher
performance requires faster SSD, and using ioring API.

I do not believe that dpdkcap is really supporting writing at 10 Gbit/sec only
that it can capture data on a 10 Gbit/sec device.


Re: No free hugepages reported

2024-03-31 Thread Stephen Hemminger
On Sun, 31 Mar 2024 16:28:19 +0530
Lokesh Chakka  wrote:

> Hello,
> 
> I've installed dpdk in Ubuntu 23.10 with the command "sudo apt -y install
> dpdk*"
> 
> added  "nodev /mnt/huge hugetlbfs pagesize=1GB 0 0" in /etc/fstab
> added "vm.nr_hugepages=1024" in /etc/sysctl.conf
> 
> rebooted the machine and then did devbind using the following command:
> 
> sudo modprobe vfio-pci && sudo dpdk-devbind.py --bind=vfio-pci 63:00.0
> 63:00.1
> 
> Huge page info is as follows :
> 
> *
> $ cat /proc/meminfo | grep Huge
> AnonHugePages:  6144 kB
> ShmemHugePages:0 kB
> FileHugePages: 0 kB
> HugePages_Total:1024
> HugePages_Free: 1023
> HugePages_Rsvd:0
> HugePages_Surp:0
> Hugepagesize:   2048 kB
> Hugetlb: 2097152 kB
> *

Your hugepages are not setup correctly. The mount is for 1G pages
and the sysctl entry makes 2M pages.

Did you try using the dpdk-hugepages script?


Re: --lcores: what it does and does not do

2024-03-30 Thread Stephen Hemminger
On Sat, 30 Mar 2024 17:45:41 -0400
fwefew 4t4tg <7532ya...@gmail.com> wrote:

> // If program run with --lcores=(1)@2,(2)@4 this loop will
> // create and run two threads lcore 1 pinned to CPU 2 and lcore 2
> // pinned to CPU 4. the output will look like:
> // hello from core 1
> // hello from core 2
> RTE_LCORE_FOREACH_WORKER(lcore_id) {
> rte_eal_remote_launch(lcore_hello, NULL, lcore_id);
> }
> .

You don't need that loop, that is what rte_eal_mp_remote_launch does
with more error checking.


Re: --lcores: what it does and does not do

2024-03-30 Thread Stephen Hemminger
On Sat, 30 Mar 2024 12:06:15 -0400
fwefew 4t4tg <7532ya...@gmail.com> wrote:

> I've made a DPDK test program that does the following on the same machine
> and for the same NIC. This is a test; it does nothing practical:
> 
> * Creates 1 RX queue then reads and pretty prints contents in a loop
> * Creates 1 TX queue then sends packets to a hardcoded IP address in a loop
> 
> When I run this program I include the command line arguments
> "--lcores=(0)@1,(1)@2" which is passed to 'rte_eal_init'.
> 
> This means there's a lcore identified '0' run on CPU HW core 1, and lcore
> '1' run on CPU HW core 2.
> 
> As I understand it, the intent of the lcores argument is that this test
> program will eventually run the RX loop as lcore 0, and the TX loop as
> lcore 1 (or vice-versa).
> 
> On the other hand after all the required DPDK setup is done --- memzones,
> mempools, queues, NIC devices initialized and started --- here's what DPDK
> has not done:
> 
> * It hasn't started an application thread for lcore 0 or lcore 1
> * DPDK doesn't know the function entry point for either loop so no loops
> are running.
> 
> Which is totally fine ... DPDK isn't magic. If the application programmer
> wants a RX and TX application thread pinned to some CPU, it should create
> the threads, set the CPU affinity, and run the loop in the NUMA aligned
> way. This is trivial to do. That is, with the required DPDK setup done, all
> that's left is to do this trivial work ... and the test program is up and
> running: problem solved.
> 
> The --lcores doesn't and cannot do this application work. So what is the
> practical result of it? What does it do?

Did your application every start the other threads with DPDK?
It is not recommended for applications to manage their own threads.
Possible but hard to get right.

A typical application looks like:


main() {
// do some initialization
rte_eal_init()

// do more initialization that has to be done in main thread

rte_eal_mp_remote_launch(worker_func, arg, CALL_MAIN);

// main thread worker_func has exited, wait for others
rte_eal_mp_wait_lcore()

// back to single main thread
rte_eal_cleanup()
}



Re: MLX5 VF stops transmitting when the Physical Function is added to a Linux bridge

2024-03-25 Thread Stephen Hemminger
On Mon, 25 Mar 2024 15:59:36 +0100
Antonio Di Bacco  wrote:

> I have a Connect X5 card (PF ens1f0np0) directly connected to another server:
> 
> 1) create VF on PF on both servers
> 2) change mac address of VFs to my own addressing
> 3) start testpmd on server 1 in txonly mode to transmit to server 0
> 4) start testpmd on server 0 in rxonly mode to receive
> 5) everything is fine, I keep receiving packets on node-0
> 
> Now, on server 1 I add the PF to a linux bridge, and everything's still fine.
> 
> If I add another interface (a simple 1Gbps with no VF, ens5f0)  to the
> linux bridge, then, I don't receive anymore packets on node-0
> 
> If I remove the ens5f0 from the bridge or I put down the ens5f0 the
> traffic flow restarts.
> 
> I understand that DPDK uses the VF directly with no dependencies on
> the kernel. How can operations on the kernel side (like adding an
> interface to bridge) can affect the VF?
> 
> 
> Best regards,
> Antonio.

Mellanox is bifurcated driver, so kernel and DPDK interact.
Adding device to bridge will change MAC address.


Re: dpdk-testpmd on XDP

2024-03-22 Thread Stephen Hemminger
On Fri, 22 Mar 2024 18:57:54 +0100
Alessio Igor Bogani  wrote:

> Hi Stephen,
> 
> Thank you for your support!
> 
> On Wed, 20 Mar 2024 at 17:24, Stephen Hemminger
>  wrote:
> [...]
> > Then you need to build bpf tools from the same kernel directory and use  
> [...]
> 
> Using the right version of libbpf makes all errors disappear!
> Unfortunately packets aren't flowing yet between the two interfaces
> (things that happen when I use the vfio-pci approach).
> Do you have other tips for me?
> 
> Thanks!
> 
> Ciao,
> Alessio

You are going to have instrument and debug the internals of XDP.
I am not an expert there.


Re: dpdk-testpmd on XDP

2024-03-20 Thread Stephen Hemminger
On Wed, 20 Mar 2024 08:22:05 +0100
Alessio Igor Bogani  wrote:

> Stephen,
> 
> Thanks for your reply!
> 
> On Tue, 19 Mar 2024 at 17:07, Stephen Hemminger
>  wrote:
> >
> > On Tue, 19 Mar 2024 11:48:53 +0100
> > Alessio Igor Bogani  wrote:
> >  
> > > The only suspicious part in the output of the dpdk-testpmd utility is:
> > > [...]
> > > libxdp: XDP flag not supported by libxdp.
> > > libbpf: prog 'xdp_dispatcher': BPF program load failed: Invalid argument
> > > libbpf: prog 'xdp_dispatcher': -- BEGIN PROG LOAD LOG --
> > > Validating prog0() func#1...
> > > btf_vmlinux is malformed
> > > Arg#0 type PTR in prog0() is not supported yet.
> > > processed 0 insns (limit 100) max_states_per_insn 0 total_states 0
> > > peak_states 0 mark_read 0
> > > -- END PROG LOAD LOG --
> > > libbpf: failed to load program 'xdp_dispatcher'
> > > libbpf: failed to load object 'xdp-dispatcher.o'
> > > libxdp: Failed to load dispatcher: Invalid argument
> > > libxdp: Falling back to loading single prog without dispatcher
> > > [...]  
> >
> > What distribution and kernel version?  
> 
> Custom distribution (Yocto Kirkstone) using 5.10.184.All parts
> (kernel, DPDK, libbpf, xdp-tools) are built (cross-compiled) from
> source.

Then you need to build bpf tools from the same kernel directory and use
them in the DPDK build.  The problem is that DPDK build uses pkg-config
to get the XDP and BPF library versions. That probably won't work as expected
with this type of Yocto build.


Re: dpdk-testpmd on XDP

2024-03-19 Thread Stephen Hemminger
On Tue, 19 Mar 2024 11:48:53 +0100
Alessio Igor Bogani  wrote:

> The only suspicious part in the output of the dpdk-testpmd utility is:
> [...]
> libxdp: XDP flag not supported by libxdp.
> libbpf: prog 'xdp_dispatcher': BPF program load failed: Invalid argument
> libbpf: prog 'xdp_dispatcher': -- BEGIN PROG LOAD LOG --
> Validating prog0() func#1...
> btf_vmlinux is malformed
> Arg#0 type PTR in prog0() is not supported yet.
> processed 0 insns (limit 100) max_states_per_insn 0 total_states 0
> peak_states 0 mark_read 0
> -- END PROG LOAD LOG --
> libbpf: failed to load program 'xdp_dispatcher'
> libbpf: failed to load object 'xdp-dispatcher.o'
> libxdp: Failed to load dispatcher: Invalid argument
> libxdp: Falling back to loading single prog without dispatcher
> [...]

What distribution and kernel version?
BPF/XDP has changed a lot over last couple of years and not maintained
compatibility.  If you are building your own kernel, likely need to build
the xdp library as well.  If getting from a distro (Fedora, Ubuntu, Debian, etc)
then make sure that xdp and kernel match.


Re: is RSS and Flow director can work together

2024-03-11 Thread Stephen Hemminger
On Mon, 11 Mar 2024 09:17:01 +
Balakrishnan K  wrote:

> Hi All,
> I want to use the dpdk application with RSS and flow director.
> is possible to use both at a time in application.
> In RSS, I am using
> action_rss_tcp.types = ETH_RSS_NONFRAG_IPV4_TCP | ETH_RSS_L3_SRC_ONLY | 
> ETH_RSS_L3_DST_ONLY;
> to receive the similar traffic to same core.
> One specific case where I wanted to distribute the traffic across core, here 
> the incoming traffic having same src and dst IP
> Example( src ip : 10.10.10.1 dst ip :20.20.20.2) .
> With RSS enabled all the traffic going to end up in one core ,where the 
> remaining cores are being idle impacting the performance.
> Planning enable flow director and create rule to distribute the traffic for 
> the combination src /dst ip (10.10.10.1 /20.20.20.2) along with RSS.
> 
> if RSS and flow rule having same criteria which one takes the priority .
> 
> Regards,
> Bala

You can do that with rte_flow action of rte_flow_action_rss.


Re: testpmd: no probed ethernet devices i219-v vfio-pci

2024-03-08 Thread Stephen Hemminger
On Fri, 8 Mar 2024 21:19:08 +
sonntex  wrote:

> Hi,
> 
> I am trying to configure dpdk on my laptop and get "no probed ethernet
> devices" in dpdk-testpmd utility:
> 
> laptop :: ~ % sudo dpdk-testpmd -l 0-1 -n 4 --log-level=debug -- -i
> EAL: Detected CPU lcores: 8
> EAL: Detected NUMA nodes: 1
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: VFIO support initialized
> testpmd: No probed ethernet devices
> Interactive-mode selected
> testpmd: create a new mbuf pool : n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> Done
> testpmd> ...  
> 
> Checked that dpdk 23.07 supports this my NIC at
> http://doc.dpdk.org/guides/rel_notes/release_23_07.html:
> 
> Intel Corporation Ethernet Connection (16) I219-V
> Firmware version: 0.6-4
> Device id (pf): 8086:1a1f
> Driver version(in-tree): 5.15.113-rt64 (Ubuntu22.04.2)(e1000)
> 
> Configuration:
> 
> laptop :: ~ % pacman -Ss dpdk
> extra/dpdk 23.07-1 [installed]
> A set of libraries and drivers for fast packet processing
> 
> laptop :: ~ % sudo ethtool -i enp0s31f6
> driver: e1000e
> version: 6.7.8-arch1-1
> firmware-version: 0.6-4
> expansion-rom-version:
> bus-info: :00:1f.6
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
> 
> laptop :: ~ % sudo modprobe vfio-pci
> laptop :: ~ % sudo lsmod | grep vfio
> vfio_pci   16384  0
> vfio_pci_core  86016  1 vfio_pci
> vfio_iommu_type1   45056  0
> vfio   73728  3 vfio_pci_core,vfio_iommu_type1,vfio_pci
> iommufd   106496  1 vfio
> irqbypass  12288  2 vfio_pci_core,kvm
> 
> laptop :: ~ % sudo dpdk-hugepages.py -m
> laptop :: ~ % sudo dpdk-hugepages.py -p 2M --setup 1G
> laptop :: ~ % sudo dpdk-hugepages.py -s
> Node Pages Size Total
> 0512   2Mb1Gb
> Hugepages mounted on /dev/hugepages
> 
> laptop :: ~ % sudo dpdk-devbind.py --status-dev net
> Network devices using kernel driver
> ===
> :00:14.3 'Comet Lake PCH-LP CNVi WiFi 02f0' if=wlan0 drv=iwlwifi
> unused= *Active*
> :00:1f.6 'Ethernet Connection (10) I219-V 0d4f' if=enp0s31f6 drv=e1000e
> unused=
> 
> laptop :: ~ % sudo dpdk-devbind.py -b vfio-pci :00:1f.6
> laptop :: ~ % sudo dpdk-devbind.py --status-dev net
> Network devices using DPDK-compatible driver
> 
> :00:1f.6 'Ethernet Connection (10) I219-V 0d4f' drv=vfio-pci
> unused=e1000e
> Network devices using kernel driver
> ===
> :00:14.3 'Comet Lake PCH-LP CNVi WiFi 02f0' if=wlan0 drv=iwlwifi
> unused=vfio-pci *Active
> 
> Any suggestions on what might be missing here?
> 
> Thanks!

Most likely the DPDK E1000 driver doesn't support the full range of PCI device
id's as the kernel driver. What is PCI information for you? I have similar
device on this machine.

$ lspci -n -s 00:1f.6
00:1f.6 0200: 8086:15fc (rev 20)

In my case the part that matters is the 15fc.
Looking in DPDK drivers/net/e1000/base/e1000_hw.h, there is no #define for that
type and no entry in drivers/net/e1000/em_ethdev.c:pci_id_em_map[]

In linux kernel the entry is:
drivers/net/ethernet/intel/e1000e/hw.h:#define E1000_DEV_ID_PCH_TGP_I219_V13
   0x15FC

The Intel drivers are not in sync.  It is up to the E1000 DPDK
maintainers to solve.

Note: this older E1000 hardware is not fast, and using DPDK
except as a test bed is really not worth it.


Re: Symmetric RSS Hashing support in DPDK

2024-03-06 Thread Stephen Hemminger
On Wed, 6 Mar 2024 07:28:40 +
Balakrishnan K  wrote:

> Hello,
>Our application needs symmetric hashing to handle the reverse traffic on 
> the same core, also to
> Improve performance by distributing the traffic across core.
> Tried using rss config as below .
> action_rss_tcp.types = ETH_RSS_NONFRAG_IPV4_TCP | ETH_RSS_L3_SRC_ONLY| 
> ETH_RSS_L3_DST_ONLY | ETH_RSS_L4_SRC_ONLY | ETH_RSS_L4_DST_ONLY;
> but could not get desired result.
> Is there any options or API available to enable symmetric RSS hashing .
> We are using dpdk 20.11 and intel NIC X710 10GbE .
> 
> Regards,
> Bala

With XL710 there are two choices:
1. Set RSS hash function to RTE_ETH_HASH_SYMMETRIC_TOEPLITZ in
   the rte_eth_rss_conf passed in during configure
2. Use default (non symmetric TOEPLITZ) but pass in a rss_key that
   has duplicated bits in the right place. Like:

0x6d5a 0x6d5a 0x6d5a 0x6d5a
0x6d5a 0x6d5a 0x6d5a 0x6d5a
0x6d5a 0x6d5a 0x6d5a 0x6d5a
0x6d5a 0x6d5a 0x6d5a 0x6d5a
0x6d5a 0x6d5a 0x6d5a 0x6d5a

https://www.ndsl.kaist.edu/~kyoungsoo/papers/TR-symRSS.pdf


Re: DPDK Pdump tool is not able to attach to DPDK test pmd primary application

2024-02-25 Thread Stephen Hemminger
On Thu, 22 Feb 2024 19:27:01 +
Sameer Vaze  wrote:

> I don't think the pdump application is doing the transmit/receive. Am I 
> missing something in the command line arguments I passed below that would 
> configure it to transmit/receive?
> 
> Passing in the rx-dev and tx-dev as pcap should not cause that right? The 
> documentation mentions those as mandatory arguments. I tried specifying an 
> interface as well and saw the same issue.
> 
> Thanks

The issue arises if send/receive is done in a secondary process as in:

1. Primary process
2. Secondary process that does send/receive
3. Pdump application (secondary)

If #2 does send/receive it will not be seen by #3


Re: KNI RTNETLINK operation not supported

2024-02-20 Thread Stephen Hemminger
On Tue, 20 Feb 2024 23:23:03 +
John He  wrote:

> I am using a CentOS 7.9 VMware VM, kernel version
> 
> kernel-3.10.0-1160.92.1.el7.x86_64
> 
> I have created a KNI interface but when I run
> 
> ip link set  up
> 
> I get
> 
> RTNETLINK answers: Operation not supported
> 
> This same software and command works on a CentOS 7.7 bare iron machine.
> 
> What is the problem, and how do I investigate this?


KNI is deprecated and no longer supported. But then again so is CentOS 7


Re: DPDK Pdump tool is not able to attach to DPDK test pmd primary application

2024-02-17 Thread Stephen Hemminger
On Wed, 14 Feb 2024 22:44:46 +
Sameer Vaze  wrote:

> Hey Folks,
> 
> I see the following message when I attempt to attach pdump to the testpmd 
> sample application:
> 
> Pdump output:
> EAL: Failed to hotplug add device
> EAL: Error - exiting with code: 1
>   Cause: vdev creation failed:create_mp_ring_vdev:695
> 
> Testpmd output:
> Reached maximum number of Ethernet ports
> EAL: Driver cannot attach the device (net_pcap_rx_0)
> EAL: Failed to hotplug add device on primary
> 
> The primary and secondary applications were run using the following commands:
> 
> Primary: sudo ./dpdk-testpmd --proc-type=primary --file-prefix=test -d 
> /path/to/pmd
> Secondary: sudo ./dpdk-pdump --proc-type=secondary --file-prefix=test -d 
> /path/to/pmd -- --pdump 'port=0,queue=1,rx-dev=./rx.pcap,tx-dev=./tx.pcap'
> 
> DPDK version: 22.11.1
> 
> Is this a known issue? Is there any known fix for this?
> 
> Thanks
> Sameer Vaze

pdump and dumpcap can't work where the actual data transmit/receive is done in 
secondary
process. The problem is in the design using callbacks and the way the 
pdump/dumpcap is
initialized.

The pdump/dumpcap works as an additional secondary process.
At startup pdump/dumpcap communicates with primary process to enable callbacks 
on rx/tx but
these work only in the primary process. In secondary process, there are no 
rx/tx callbacks.


Re: DDP and symmetric RSS hash

2024-02-13 Thread Stephen Hemminger
On Mon, 29 Jan 2024 19:18:38 +0300
Виктория Доможакова  wrote:

> Hi,
> I'm trying to set toeplitz symmetric hash function for packets from DDP.
> I've created flows for pppoe/pppol2tpv2/l2tpv2 headers in patterns and 
> PCtypes as RSS hash types. But it is not working.
> What should I do to configure toeplitz symmetric hash function for packets 
> from ddp?
>  
> Best regards,
> Viktoriya Domozhakova
> 

With most hardware you have to use a Toeplitz key which results in the 
symmetric hash.
Some hardware does symmetric hash by setting

/**
 * Symmetric Toeplitz: src, dst will be replaced by
 * xor(src, dst). For the case with src/dst only,
 * src or dst address will xor with zero pair.
 */
RTE_ETH_HASH_FUNCTION_SYMMETRIC_TOEPLITZ,
/**
 * Symmetric Toeplitz: L3 and L4 fields are sorted prior to
 * the hash function.
 *  If src_ip > dst_ip, swap src_ip and dst_ip.
 *  If src_port > dst_port, swap src_port and dst_port.
 */
RTE_ETH_HASH_FUNCTION_SYMMETRIC_TOEPLITZ_SORT,

But that is restricted to newer smart NICs.


Re: Split traffic between the Linux stack and DPDK application

2024-01-29 Thread Stephen Hemminger
On Mon, 29 Jan 2024 18:09:28 +0200
Pavel Vazharov  wrote:

> Hi there,
> 
> A DPDK can run on top of XDP sockets and use custom XDP program to split
> the traffic between the Linux stack and the DPDK application. This way
> still allows zero copy between the kernel and the DPDK application.
> Is there another zero-copy way to achieve redirecting some part of the
> traffic to the Linux kernel and another to a DPDK application?
> For example, AFAIK I can run the DPDK application and redirect packets from
> inside to the Linux stack via the DPDK KNI functionality but it'll be much
> slower because it'll require packets copying and context switch (if I'm not
> mistaken).
> 
> Regards,
> Pavel.

There is no generic solution. It is possible with some hardware drivers
like Mlx. Intel was working on bifurication as well but seems to have abandoned 
it.

KNI is no longer part of DPDK, and it did a copy (hidden in kernel).
The problem with zero copy from kernel point of view is how to deal with packet
lifetime.


Re: Questions about running XDP sockets on top of bonding device or on the physical interfaces behind the bond

2024-01-25 Thread Stephen Hemminger
On Thu, 25 Jan 2024 10:48:07 +0200
Pavel Vazharov  wrote:

> Hi there,
> 
> I'd like to ask for advice for a weird issue that I'm facing trying to run
> XDP on top of a bonding device (802.3ad) (and also on the physical
> interfaces behind the bond).
> 
> I've a DPDK application which runs on top of XDP sockets, using the DPDK 
> AF_XDP
> driver . It was a pure DPDK
> application but lately it was migrated to run on top of XDP sockets because
> we need to split the traffic entering the machine between the DPDK
> application and other "standard-Linux" applications running on the same
> machine.
> The application works fine when running on top of a single interface but it
> has problems when it runs on top of a bonding interface. It needs to be
> able to run with multiple XDP sockets where each socket (or group of XDP
> sockets) is/are handled in a separate thread. However, the bonding device
> is reported with a single queue and thus the application can't open more
> than one  XDP socket for it. So I've tried binding the XDP sockets to the
> queues of the physical interfaces. For example:
> - 3 interfaces each one is set to have 8 queues
> - I've created 3 virtual af_xdp devices each one with 8 queues i.e. in
> summary 24 XDP sockets each bound to a separate queue (this functionality
> is provided by the DPDK itself).
> - I've run the application on 2 threads where the first thread handled the
> first 12 queues (XDP sockets) and the second thread handled the next 12
> queues (XDP socket) i.e. the first thread worked with all 8 queues from
> af_xdp device 0 and the first 4 queues from af_xdp device 1. The second
> thread worked with the next 4 queues from af_xdp device 1 and all 8 queues
> from af_xdp device 2. I've also tried another distribution scheme (see
> below). The given threads just call the receve/transmit functions provided
> by the DPDK for the assigned queues.
> - The problem is that with this scheme the network device on the other side
> reports: "The member of the LACP mode Eth-Trunk interface received an
> abnormal LACPDU, which may be caused by optical fiber misconnection". And
> this error is always reported for the last device/interface in the bonding
> and the bonding/LACP doesn't work.
> - Another thing is that if I run the DPDK application on a single thread,
> and the sending/receiving on all queues is handled on a single thread, then
> the bonding seems to work correctly and the above error is not reported.
> - I've checked the code multiple times and I'm sure that each thread is
> accessing its own group of queues/sockets.
> - I've tried 2 different schemes of accessing but each one led to the same
> issue. For example (device_idx - queue_idx), I've tried these two orders of
> accessing:
> Thread 1Thread2
> (0 - 0) (1 - 4)
> (0 - 1) (1 - 5)
> ...(1 - 6)
> ...(1 - 7)
> (0 - 7) (2 - 0)
> (1 - 0) (2 - 1)
> (1 - 1) ...
> (1 - 2) ...
> (1 - 3) (2 - 7)
> 
> Thread 1Thread2
> (0 - 0) (0 - 4)
> (1 - 0) (1 - 4)
> (2 - 0) (2 - 4)
> (0 - 1) (0 - 5)
> (1 - 1) (1 - 5)
> (2 - 1) (2 - 5)
> ......
> (0 - 3) (0 - 7)
> (1 - 3) (1 - 7)
> (2 - 3) (2 - 7)
> 
> And here are my questions based on the above situation:
> 1. I assumed that it's not possible to run multiple XDP sockets on top of
> the bonding device itself and I need to "bind" the XDP sockets on the
> physical interfaces behind the bonding device. Am I right about this or am
> I missing something?
> 2. Is the bonding logic (LACP management traffic) affected by the access
> pattern of the XDP sockets?
> 3. Is this scheme supposed to work or it's just that the design is wrong? I
> mean, maybe a group of queues/sockets shouldn't be handled on a given
> thread but only a single queue should be handled on a given application
> thread. It's just that the physical devices have more queues setup on them
> than the number of threads in the DPDK application and thus multiple queues
> need to be handled on a single application thread.
> 
> Any ideas are appreciated!
> 
> Regards,
> Pavel.

Look at recent discussions on netdev mailing list.
Linux bonding device still needs more work to fully support XDP.


Re: DPDK Netvsc - RNDIS reports VF but device not found, retrying

2024-01-23 Thread Stephen Hemminger
On Tue, 23 Jan 2024 21:06:22 +0200
Oleksandr Nahnybida  wrote:

> > >
> > > I also installed rdma-core/libibverbs1 (dpdk was compiled
> > > with  ibverbs_link=shared)

Try without ibverbs_link shared? Maybe 

Mlx5 maintainers are:

M: Dariusz Sosnowski 
M: Viacheslav Ovsiienko 
M: Ori Kam 
M: Suanming Mou 
M: Matan Azrad 


Re: DPDK Netvsc - RNDIS reports VF but device not found, retrying

2024-01-23 Thread Stephen Hemminger
On Tue, 23 Jan 2024 19:51:33 +0200
Oleksandr Nahnybida  wrote:

> azureuser@dpdk0:~$ ethtool -i enP47056s2
> driver: mlx5_core
> version: 5.15.0-1053-azure
> firmware-version: 16.30.1284 (MSF12)
> expansion-rom-version:
> bus-info: b7d0:00:02.0
> 
> I also installed rdma-core/libibverbs1 (dpdk was compiled
> with  ibverbs_link=shared)
> 
> azureuser@dpdk0:~$ apt-cache policy rdma-core
> rdma-core:
>   Installed: 28.0-1ubuntu1
> azureuser@dpdk0:~$ apt-cache policy libibverbs1
> libibverbs1:
>   Installed: 28.0-1ubuntu1

Sorry, my expertise on all the variations of mlx5 driver stuff is
limited, and no longer have free access to Azure.


Re: DPDK Netvsc - RNDIS reports VF but device not found, retrying

2024-01-23 Thread Stephen Hemminger
On Tue, 23 Jan 2024 14:34:12 +0200
Oleksandr Nahnybida  wrote:

> Hello,
> 
> I am trying to set up dpdk with netvsc as master pmd on Azure following
> https://learn.microsoft.com/en-us/azure/virtual-network/setup-dpdk?tabs=ubuntu
> and
> https://doc.dpdk.org/guides-22.11/nics/netvsc.html
> 
> but I have the following error messages
> 
> EAL: VFIO support initialized
> EAL: Probe PCI driver: mlx5_pci (15b3:1018) device: 8565:00:02.0 (socket 0)
> mlx5_net: DV flow is not supported.
> mlx5_common: Failed to allocate DevX UAR (BF/NC)
> mlx5_common: Failed to allocate UAR.
> mlx5_net: Failed to prepare Tx DevX UAR.
> mlx5_net: probe of PCI device 8565:00:02.0 aborted after encountering an
> error: Operation not permitted
> mlx5_common: Failed to load driver mlx5_eth
> EAL: Requested device 8565:00:02.0 cannot be used
> EAL: Probe PCI driver: mlx5_pci (15b3:1018) device: d377:00:02.0 (socket 0)
> mlx5_net: DV flow is not supported.
> mlx5_common: Failed to allocate DevX UAR (BF/NC)
> mlx5_common: Failed to allocate UAR.
> mlx5_net: Failed to prepare Tx DevX UAR.
> mlx5_net: probe of PCI device d377:00:02.0 aborted after encountering an
> error: Operation not permitted
> mlx5_common: Failed to load driver mlx5_eth
> EAL: Requested device d377:00:02.0 cannot be used
> EAL: Probe PCI driver: mlx5_pci (15b3:1018) device: f97e:00:02.0 (socket 0)
> mlx5_net: DV flow is not supported.
> mlx5_common: Failed to allocate DevX UAR (BF/NC)
> mlx5_common: Failed to allocate UAR.
> mlx5_net: Failed to prepare Tx DevX UAR.
> mlx5_net: probe of PCI device f97e:00:02.0 aborted after encountering an
> error: Operation not permitted
> mlx5_common: Failed to load driver mlx5_eth
> EAL: Requested device f97e:00:02.0 cannot be used
> EAL: Bus (pci) probe failed.
> hn_vf_attach(): Couldn't find port for VF
> hn_vf_add(): RNDIS reports VF but device not found, retrying
> hn_vf_attach(): Couldn't find port for VF
> hn_vf_add(): RNDIS reports VF but device not found, retrying
> and so on in a loop
> 
> Also, with debug logs I see
> 
> mlx5_common: DevX read access NIC register=0X9055 failed errno=22 status=0
> syndrome=0
> mlx5_common: DevX read access NIC register=0X9055 failed errno=22 status=0
> syndrome=0
> mlx5_common: DevX read access NIC register=0X9055 failed errno=22 status=0
> syndrome=0
> 
> 
> The DPDK version is 22.11, running on
> Linux dpdk0 5.15.0-1053-azure #61~20.04.1-Ubuntu
> VM type D8ls v5, accelerated networking is on for two ports
> 
> driverctl -b vmbus list-devices
> 
> 000d3a1c-e0df-000d-3a1c-e0df000d3a1c hv_netvsc
> 000d3a1e-4573-000d-3a1e-4573000d3a1e uio_hv_generic [*]
> 000d3a1e-47da-000d-3a1e-47da000d3a1e uio_hv_generic [*]
> 
> 
> Any, ideas what I might be doing wrong? I see the same behavior with
> testpmd and my app.

Looks like a mlx5 driver issue, which kernel driver are you using?
The netvsc code sees a VF but can't use it because of the mlx5 errors.


Re: Tapping a Tx port to access all the packets being Txed

2024-01-11 Thread Stephen Hemminger
On Thu, 11 Jan 2024 05:55:53 +
Nicolson Ken (ニコルソン ケン)  wrote:

> Hi all,
> 
> I'm wanting to basically tap a DPDK stream and for instance save all packets 
> being Txed by a PMD to a separate file.
> 
> I know I could relatively easily modify an existing PMD to have two output 
> ports, but is there either an existing PMD that does this, or a programmatic 
> way to graft a second port onto an arbitrary PMD, or is there another simpler 
> way that I haven't realised yet. As mbufs are already reference counted, I 
> would hope, and require, that such a solution would be zero copy. 
> RSS-friendly would also be a plus.
> 
> Thanks,
> Ken

You could use Tx callback to do this, or pdump, or pcap PMD.


Re: Detecting if rte_eal_init() has already been called

2023-12-26 Thread Stephen Hemminger
On Wed, 27 Dec 2023 03:54:12 +
Nicolson Ken (ニコルソン ケン)  wrote:

> Hi all,
> 
> I'm loading two semi-independent DPDK-ready shared libraries into a master 
> process. If both call rte_eal_init() I get a fatal error about calling it a 
> second time. I tried the rte_eal_primary_proc_alive(NULL) API, but that 
> failed to detect that the other library had already called rte_eal_init().
> 
> I feel there should be a simple rte_eal_is_inited()-like API somewhere, but I 
> cannot find it.
> 
> Note, I cannot easily change the master process as it is a third-party tool 
> that knows nothing about DPDK. For now I am just relying on the order of 
> loading the libraries and skipping rte_eal_init() on the second.
> 
> Thanks,
> Ken

Libraries should not be calling rte_eal_init()!
Even if you fix the init side, the shutdown cleanup handling would be impacted.

Maybe introduce an initializer and destructor in one place would be a workaround


Re: VM SR-IOV VF fails dpdk

2023-12-12 Thread Stephen Hemminger
On Tue, 12 Dec 2023 16:33:51 +
"Lombardo, Ed"  wrote:

> Hi,
> I am finding an issue on ESXi host running our Linux VM application.  The 
> DPDK version is 17.11 and VM OS is 3.10.0-1160.81.1.el7.x86_64.
> ESXi host version is 7.03.
> The NIC is the Intel x710-DA2 in SR-IOV mode.  Configured on ESXi host 4 
> virtual functions.
> Our VM has one of the four virtual functions from the x710 NIC for DPDK port.
> 
> In our application, when we initialize the port and all the checks pass, but 
> rte_eth_dev_start() fails for -1 return code.
> The failure from gdb is:
> i40evf_configure_vsi_queues(): Failed to execute command of 
> VIRTCHNL_OP_CONFIG_VSI_QUEUES
> i40evf_dev_start(): configure queues failed
> 
> Our application requires 1 Rx queue and 4 Tx queues.
> 
> When I change the number of Tx queues from 4 to 1 the failure is not seen.  
> However our application requires 4 Tx queues so not an option.
> This same VM ran successfully when ESXi host was 5.5.
> 
> We had our IT department do a sanity check on the ESXi host and upgrade any 
> VIBs recommended, but did not resolve the problem.
> 
> Tried testpmd and also failed for DPDK start.
> 
> Thanks,
> Ed

Using a 5 year old version of DPDK is not recommended or supported.

Not familiar with VMWare internals, but suspect that it requires
that the number of Rx and Tx queues are the same.


Re: DPDK used for building programs and the one used to run the programs

2023-12-05 Thread Stephen Hemminger
On Tue, 5 Dec 2023 14:47:47 +0100
Antonio Di Bacco  wrote:

> On the target machine I use to have a DPDK compiled after installing
> the Mellanox 5/6 drivers.
> I see that there are files related to MLX5 pmds in the target machine
> (include files).
> 
> To compile my programs I use a container where there is installed a
> DPDK that doesn't have the MLX5 support, I mean I don't find the
> rte_mlx5_pmds.h in the container.
> 
> Now, a program compiled in the container could have problems when
> using MLX5 on the target machine?
> 
> Thanks,
> Anna.

Header files are only used when compiling.
A possible issue is any shared libraries that MLX5 PMD is depending on not being
visible in the container.


Re: how to make dpdk processes tolerable to segmantation fault?

2023-11-30 Thread Stephen Hemminger
On Thu, 30 Nov 2023 19:24:01 +0300
Dmitry Kozlyuk  wrote:

> 2023-11-30 13:45 (UTC+0600), Fuji Nafiul:
> > In a normal c program, I saw that the segmentation fault in 1 loosely
> > coupled thread doesn't necessarily affect other threads or the main
> > program. There, I can check all the threads by process ID of it in every
> > certain period of time and if some unexepected segmentation fault occurs or
> > got killed I can re run the thread and it works fine. I can later monitor
> > the logs and inspect the situation.
> > 
> > But I saw that, segmentation fault or other unexpected error in remotely
> > launched (using DPDK) functions on different core affects the whole dpdk
> > process and whole dpdk program crashes.. why is that?
> > 
> > Is there any alternative way to handle this scenario ? How can I take
> > measures for unexpected future error occurance where I should auto rerun
> > dpdk remote processes in live system?  
> 
> Please consider running the buggy code that causes SIGSEGV
> in a separate process rather than a thread.
> If it must use DPDK, can it be made an independent app?
> 
> DPDK is unlikely to ever support the described scenario.
> Continuing to run the process after SIGSEGV is inherently unsafe.
> Specifically, DPDK communicates with its lcore threads
> using pipes allocated at startup.
> If such thread crashed and a SIGSEGV not killing the app was installed,
> the communication would hang.
> Generally, DPDK employs user-space synchronization primitives,
> which cannot recover if one of the threads using them crashes.


A couple of things you can do.
  - run your DPDK application as a systemd service which will be restarted
when you crash.
  - catch SIGSEGV in the application an print a backtrace, then abort.


Re: Non eal registered thread flow

2023-11-29 Thread Stephen Hemminger
On Wed, 29 Nov 2023 14:21:55 -0800
Chris Ochs  wrote:

> Trying to get a handle on the best way to integrate with my existing
> architecture.
> 
> My main application is in Rust and it's a partitioned/batching flow. It's
> an end server. I basically send type erased streams between partitions
> using SPSC queues. Work scheduling is separate.  Workers basically do work
> stealing of partitions.  The important part is messaging is tied to
> partitions not threads.
> 
> So what I think might work best here is I assign a partition per lcore. I
> already have a design where partitions can be designated as network
> partitions, and my regular workers can then ignore these partitions.  With
> dpdk specific workers taking over.  I designed the architecture for use
> with user space networking generally from the start.
> 
> A partition in a networking flow consumes streams from other partitions
> like normal. In a dpdk flow what I think this looks like is for each stream
> call into C to transmit.  Streams would be written mbuf aligned so I think
> this is just a single memcpy per stream into dpdk buffers.  And then a
> single call to receive.
> 
> Does anything stand out here as problematic?  I read the known issues
> section and nothing there stood out as  problematic.

Are your lcore's pinned and isolated?
Is your API per packet or batch?
Are these DPDK ring buffers or some other queuing mechanism?




Re: Failed to load eBPF byte-code on TAP device

2023-11-16 Thread Stephen Hemminger
On Thu, 16 Nov 2023 13:41:27 +0530
madhukar mythri  wrote:

> Hi Stephen,
> 
> I had added some logs in the BPF verifier of Kernel code, to print the
> number of instructions processed and error-code returned as follows:
> 
> logs # dmesg |tail -n 20
> [   76.318101]  do_check: instructions Processed 89 insn
> [   76.318102]  do_check: instructions Processed 90 insn
> [   76.318103]  do_check: instructions Processed 91 insn
> [   76.318104]  do_check: instructions Processed 92 insn
> [   76.318105]  do_check: instructions Processed 93 insn
> [   76.318106]  do_check: instructions Processed 94 insn
> [   76.318107]  do_check: instructions Processed 95 insn
> [   76.318108]  do_check: instructions Processed 96 insn
> [   76.318109]  do_check: instructions Processed 97 insn
> [   76.318110]  do_check: instructions Processed 98 insn
> [   76.318111]  do_check: instructions Processed 99 insn
> [   76.318112]  do_check: instructions Processed 100 insn
> [   76.318113] BPF program is too large. Processed 101 insn
> [   76.318209] ## bpf_check:  do_check_main done..: ret: -7
> [   76.318210] ## bpf_check:  bpf_prog_offload_finalize done..:
> ret: -7
> [   76.318212] ## bpf_check:  check_max_stack_depth done..: ret: -7
> [   76.318212] ## bpf_check:  fixup_call_args done..: ret: -7
> [   76.318224] ## bpf_check:  end..: ret: -7
> [   76.318224] ##  BPF  bpf_check return err: -7..:
> =
> 
> Only these logs which I add in the Kernel-code were printed and do not see
> any other Kernel-logs.
> 
> Thanks,
> Madhuker.
> 
> On Wed, Nov 15, 2023 at 8:49 PM Stephen Hemminger <
> step...@networkplumber.org> wrote:  
> 
> > On Wed, 15 Nov 2023 15:38:55 +0530
> > madhukar mythri  wrote:
> >  
> > > Hi all,
> > >
> > > On the RHEL9.2 with DPDK 22.11.1 version, DPDK primary application failed
> > > to add RSS flow on TAP sub-device, when loading the TAP BPF byte-code
> > > instructions.
> > >
> > > This "struct bpf_insn l3_l4_hash_insns[]" array(from file:
> > > drivers/net/tap/tap_bpf_insns.h) is in eBPF bytecode instructions format,
> > > this eBPF failed to load on TAP PMD with the following error:
> > >
> > > =
> > > rss_add_actions(): Failed to load BPF section 'l3_l4' (7): Argument list
> > > too long.
> > > net_failsafe: Failed to create a flow on sub_device 1."
> > > =
> > > On Kernel-version:  5.15.0 #9 SMP PREEMPT
> > > Arch: x86_64 GNU/Linux
> > >
> > > When added some debug logs on Kernel BPF verifier code, we could see that
> > > instruction processed were reached to 1 Million.
> > > But, the Byte code has only 1698 instructions only. Why the Kernel BPF
> > > verifier is processing beyond 1,698 instructions ?
> > >
> > > The same byte-code(with DPDK-22.11.1) worked well with RHEL8.x and not
> > > working in RHEL-9.x version.
> > >
> > > Does anybody faced such issues ?
> > > Please let me know how to debug such issues on Byte-code.
> > >
> > > Thanks,
> > > Madhukar.  
> >
> > Is there anything in the kernel log?

I suspect a kernel bug.
The kernel BPF API is not stable, and RHEL can and does modify the kernel.
Likely a Redhat bug.
Try with recent TAP fixes (in 23.11-rc3). 



Re: Failed to load eBPF byte-code on TAP device

2023-11-15 Thread Stephen Hemminger
On Wed, 15 Nov 2023 15:38:55 +0530
madhukar mythri  wrote:

> Hi all,
> 
> On the RHEL9.2 with DPDK 22.11.1 version, DPDK primary application failed
> to add RSS flow on TAP sub-device, when loading the TAP BPF byte-code
> instructions.
> 
> This "struct bpf_insn l3_l4_hash_insns[]" array(from file:
> drivers/net/tap/tap_bpf_insns.h) is in eBPF bytecode instructions format,
> this eBPF failed to load on TAP PMD with the following error:
> 
> =
> rss_add_actions(): Failed to load BPF section 'l3_l4' (7): Argument list
> too long.
> net_failsafe: Failed to create a flow on sub_device 1."
> =
> On Kernel-version:  5.15.0 #9 SMP PREEMPT
> Arch: x86_64 GNU/Linux
> 
> When added some debug logs on Kernel BPF verifier code, we could see that
> instruction processed were reached to 1 Million.
> But, the Byte code has only 1698 instructions only. Why the Kernel BPF
> verifier is processing beyond 1,698 instructions ?
> 
> The same byte-code(with DPDK-22.11.1) worked well with RHEL8.x and not
> working in RHEL-9.x version.
> 
> Does anybody faced such issues ?
> Please let me know how to debug such issues on Byte-code.
> 
> Thanks,
> Madhukar.

Is there anything in the kernel log?




Re: support hardware question

2023-11-03 Thread Stephen Hemminger
 
> hi
> Does dpdk support RealTek NICs? or is there any plan to support it?

No.
No volunteers, and no vendor involvement.

But DPDK is not really important at 1G and below.
You can use TAP to access kernel driver.


Re: Questions about the the pdump functionality

2023-10-23 Thread Stephen Hemminger
On Mon, 23 Oct 2023 18:32:48 +0300
Pavel Vazharov  wrote:

> Hi there,
> 
> We've a DPDK based application from which we need to take packet dumps from
> time to time when problems arise. We are planning to use librte_pdump
> functions and the dpdk-dumppcap tool.
> I've few questions in related to the pdump functionality which we want to
> use:
> - Is calling `rte_pdump_init` at the startup of the main application
> causing some overhead for the packet processing during its run if there is
> no actual packet capturing enabled by the dpdk-dumppcap tool? I suppose
> there should be some check in place but is it something like a single `if`
> condition on a boolean flag or something heavier?
> - Is it possible then to call `rte_pdump_init` during the runtime of the
> main application only when I know that I'm about to start the dpdk-dumppcap
> tool? I mean, is it supported and safe to call `rte_pdump_init` and
> `rte_pdump_uninit` while the main application is running and processing
> packets or these functions are supposed to be called only at application
> startup and application stop.
> 
> Thanks in advance,
> Pavel.

Short answer:
The overhead only happens when the dump application is running.

Long answer:
Running rte_pdump_init() tells adds an additional service to the
multi-process (primary/secondary) communication mechanism.
Secondary and primary process connect with each other over Unix
domain socket, and the services are handled by the multi-process
thread in the primary process. Adding a service does not interact
with fast path at all.

The best way (as always) to discover this yourself is to read
the source code and follow along.


Re: Whether the creatation of flow rules of i40e NIC support tcp port mask

2023-10-17 Thread Stephen Hemminger
On Tue, 17 Oct 2023 08:52:22 +
"jiangheng (G)"  wrote:

> Hi beilei,
> 
> I would like to create flows using tcp port mask, but it seems only mask 
> 0x or 0x0 work, Does flow rlue can be created using other mask?
> 
> I40e dirver was using now.

Why not create multiple rules each pointing at same action?
Or is mask so wide that you run out of rte_flow slots.


Re: Reg Packet Type Issue in Intel X710 NIC

2023-10-16 Thread Stephen Hemminger
On Mon, 16 Oct 2023 22:55:39 +0530
Harrish SJ  wrote:

> Thanks much Stephen for your inputs. Let me check the flags and get back.
> Any inputs on GTP Packet type as UNKNOWN would be much helpful. 
> Thanks in advance,
> 
> Regards and Thanks,
> Harrish.S.J


Packet type recognition in either done in software (slow) or hardware.
The hardware recognition is limited by what that NIC is able to recognize.
I suspect most NIC's just look at ether type and maybe IP proto fields
while processing packet.

In any case packet type is not very reliable across NIC vendors.
If you really want to identify packets either do it in SW or use
rte_flow if NIC supports it.


Re: Reg Packet Type Issue in Intel X710 NIC

2023-10-16 Thread Stephen Hemminger
On Mon, 16 Oct 2023 22:47:18 +0530
Harrish SJ  wrote:

> Hi Team,
> 
> We are observing below issues w.r.t Packet types in Intel X710 NIC
> 
> NIC Details: (from dpdk-devbind -s)
> Network devices using DPDK-compatible driver
> 
> :81:00.1 'Ethernet Controller X710 for 10GbE SFP+ 1572' drv=igb_uio 
> unused=i40e,vfio-pci,uio_pci_generic
> :82:00.1 'Ethernet Controller X710 for 10GbE SFP+ 1572' drv=igb_uio 
> unused=i40e,vfio-pci,uio_pci_generic
> 
> Packet type is not set as L2_ETHER_VLAN and only set as L2_ETHER when the 
> packet is received as VLAN tagged.
> IP Packet type is set as L3_IPV4_EXT_UNKNOWN for IP/IP+UDP packets - Is this 
> expected.?
> Packet type is set as UNKNOWN for GTP packets even after configuring DDP 
> profile for the ports used.
> 
> testpmd> ddp get list 0  
> Profile number is: 1
> 
> Profile 0:
> Track id: 0x8008
> Version:  1.0.4.0
> Profile name: GTPv1-C/U IPv4/IPv6 payload
> 
> Could you please help us in resolving/providing your inputs on the above 
> issues observed.?
> Thanks in advance,
> 
> Regards and Thanks,
> Harrish.S.J
> 

Since in most cases VLAN is offloaded in mbuf and RTE_MBUF_F_RX_VLAN_STRIPPED 
is set
in offload flags.  If VLAN is stripped it makes sense that the packet type 
could just
be L2_ETHER but it looks like it maybe driver dependent which is not good.


Re: How to establish a uni-directional Ethernet link in the dpdk environment

2023-10-15 Thread Stephen Hemminger
On Sun, 15 Oct 2023 10:30:48 +0330
Alireza Sadeghpour  wrote:

> Hi,
> 
> I am trying to establish a uni-directional Ethernet link where a singular
> fiber is used to transmit data to the receiver in the DPDK environment. The
> Rx of the transmit side and the Tx of the receive side are not physically
> connected, like in a Data diode scenario. The ethernet controller on both
> sides is intel 82580.
> 
> my problem is that when I detach the RX line from one side, both sides'
> links go down.
> 
> Could anyone please give me some advice to solve this problem and establish
> a valid unidirectional ethernet link?

This is not a DPDK problem. Trying to non-standard configuration like this
requires detailed knowledge of the hardware registers, and likely driver 
specific
changes to do that.

It is possible to bring up device in normal full duplex mode and even setup
the receive queues but ignore them. But that doesn't sound like what you want.


Re: How to configure ethernet controller registers in the dpdk environment

2023-10-14 Thread Stephen Hemminger
On Sat, 14 Oct 2023 18:32:24 +0330
Alireza Sadeghpour  wrote:

>  
> 
> Hi,
> 
> I am trying to set some registers of the ethernet controller in the DPDK
> environment, but I can't find the corresponding API to do this. is there
> any API in the DPDK library for configuring the ethernet controller
> registers?


There is intentionally no API to set registers of ethernet controller.
All the configuration should be done in the Poll Mode Driver (PMD).
If you need custom configuration, it means adapting the PMD and ideally
making it part of the configuration settings and upstreaming so it
doesn't get broken in next release.


Re: capturing .pcap file on NIC interface

2023-10-10 Thread Stephen Hemminger
On Fri, 6 Oct 2023 07:53:13 -0700
Prasad Chivakula  wrote:

> Hi
> 
> I am trying to capture eCPRI packets on NIC card interface that is
> connected to a Radio in our 5G set up. I would like to know how to use
> dpdk-pdump to accomplish this? Do i have to rebuild dpdk with these
> options, if so, can you please provide instructions for this ?
> 
> Thanks in advance
> Prasad

Application has to call rte_pdump_init in startup.
Then use the dpdk-dumpcap application. The older pdump is more complex
to use and has less features.


Re: net/tap: eBPF failed to load BPF section and failed to create flows for TAP device

2023-10-09 Thread Stephen Hemminger
On Mon, 9 Oct 2023 07:04:49 +
Kouilakonda Anudattu  wrote:

> I'm adding some additional details to provide more context. I'm encountering 
> this issue on Azure/Hyper-V platform with the existing DPDK-22.11.1 
> tap_bpf_insns.h byte-code.

Unless your use case demands rte_flow, use the netvsc PMD.
It has better performance.


Re: net/tap: eBPF failed to load BPF section and failed to create flows for TAP device

2023-10-09 Thread Stephen Hemminger
On Mon, 9 Oct 2023 07:04:49 +
Kouilakonda Anudattu  wrote:

> I'm adding some additional details to provide more context. I'm encountering 
> this issue on Azure/Hyper-V platform with the existing DPDK-22.11.1 
> tap_bpf_insns.h byte-code.
> 
> 
> Regards,
> Anudattu.
> 
> From: Kouilakonda Anudattu 
> Sent: Monday, October 9, 2023 9:48 AM
> To: users@dpdk.org
> Subject: [External] : net/tap: eBPF failed to load BPF section and failed to 
> create flows for TAP device
> 
> Hi All,
> 
> 
> With the latest Oracle EL9 with DPDK 22.11.1 version, I modified the RSS eBPF 
> C program and generated the structure of a C array in the 'tap_bpf_insns.h' 
> file.
> This array is in eBPF bytecode instructions format. However, even with new 
> bytecode eBPF failed to load TAP PMD with the following error:
> 
> rss_add_actions(): Failed to load BPF section 'l3_l4' (7): Argument list too 
> long.
> net_failsafe: Failed to create a flow on sub_device 1."
> 
> 
> Currently we are using below kernel:
> 5.15.0 #9 SMP PREEMPT
> x86_64 GNU/Linux
> 
> 
> How to resolve these errors ?

You need to debug the TAP device with either gdb or adding printfs.
Did you see the modified build instructions?

https://patchwork.dpdk.org/project/dpdk/patch/20230722163259.4304-1-step...@networkplumber.org/




Re: rte_rdtsc() - what is the performance impact of using rte_rdtsc() time

2023-10-03 Thread Stephen Hemminger
On Tue, 3 Oct 2023 15:49:00 +0530
Hari Haran  wrote:

> >
> > The problem is that rte_rdtsc() stops speculative execution so doing
> > lots of TSC instructions can hurt performance.
> >
> > To correlate TSC timestamp to system time, you need to compute the offsets
> > once at startup. Alternatively, don't use rte_rdtsc() and instead use
> > clock_gettime() with the monotonic timer and the C library does the
> > calculation
> > for you.
> >  
> 
> 
> As part of query 1 and based on your response, I am asking below query
> 
> But usage of clock_gettime() (kernel function) in lcore is advisable one?
> My understanding is,
> shall avoid usage of kernel function in lcore. Correct me if I am wrong?

clock_gettime() is a virtual system call (VDSO) on most Linux platforms.
But it is still slower than simple rte_rdtsc().

Internally clock_gettime VDSO 


Re: tap device speed

2023-10-02 Thread Stephen Hemminger
On Mon, 2 Oct 2023 21:13:03 +0200
Antonio Di Bacco  wrote:

> I'm doing a test where we have a couple of tap devices, the two
> devices are seen by testpmd that is setup in forward mode.
> 
> On the linux side, the two tap devices are confined in different
> network namespaces and in one namespace we have an iperf server while
> on the other namespace the iperf client sending either UDP or TCP.
> 
> I expected a bandwidth in the range of few gpbs while the actual
> measured bandwidth is a few gigabits.
> 
> I suppose I need to configure the tap devices with optimized
> parameters but I don't know where to look for advice.
> 
> If I try to use the loopback interface I can get something 40 gbps
> with a command like this:
> 
> iperf -c 127.0.0.1 -u -i 1 -b 40g -t 10 -l 4
> 
> .

Sorry TAP device is inherently slow. It requires copies to/from Linux
kernel. You are doing well if you get 1 million packets per second.

One thing to check is that checksum is not being done twice.


Re: dumpcap: timestamp is years ahead when in pcapng format

2023-09-21 Thread Stephen Hemminger
On Thu, 21 Sep 2023 08:14:12 +0300
Isaac Boukris  wrote:

> > This is less accurate. The TSC (CPU clock frequency) is not necessarily
> > an even multiple of nanoseconds.
> >
> > If you want to send patches please follow the contributing guidelines
> > and run checkpatch on them.  
> 
> Yeah, I realized that and tried to improve on it by dividing by less
> at init and multiplying by less at run time. However I noticed another
> problem with my patches, that there is a time gap that keeps growing
> for some reason, and I can't figure out what's wrong. I'll try some
> more and will gladly test anything proposed.

All this should be fixed in this patch series.
https://patchwork.dpdk.org/project/dpdk/list/?series=29581


Re: dumpcap: timestamp is years ahead when in pcapng format

2023-09-20 Thread Stephen Hemminger
On Wed, 20 Sep 2023 22:55:21 +0300
Isaac Boukris  wrote:

> I found a way to get a better resolution; at init we set
> 'pcapng_time.tsc_hz=rte_get_tsc_hz()/NSEC_PER_SEC' this way we keep
> the number of cycles in a nano-second, then at run time we just need
> to divide delta by this number (with no need to multiply by
> NSEC_PER_SEC).
> 
> The problem is I guess, that on slow systems we'll end up with
> tsc_hz=0? Perhaps we'd need to drop to ms resolution in such a case.
> 
> With the attach patch I get:
> 
> 2023-09-20 10:22:13.579219 IP Rocky8 > A: ICMP echo request, id 13,
> seq 63, length 64
> 2023-09-20 10:22:13.580582 IP A > Rocky8: ICMP echo reply, id 13, seq
> 63, length 64 3
> 2023-09-20 10:22:14.745176 IP Rocky8 > A: ICMP echo request, id 13,
> seq 64, length 64
> 2023-09-20 10:22:14.746206 IP ...
> 
> On Wed, Sep 20, 2023 at 9:53 PM Isaac Boukris  wrote:
> >
> > I figured the first packet bug, fixed with:
> > -   if (!pcapng_time.tsc_hz)
> > +   if (!pcapng_time.tsc_hz) {
> > pcapng_init();
> > +   return pcapng_time.ns;
> > +   }
> >
> > However I noticed a caveat with my proposed fix as it seem we only get
> > a time resolution of one sec:
> >
> > 2023-09-20 09:40:20.727638 IP Rocky8 > A: ICMP echo request, id 11,
> > seq 81, length 64
> > 2023-09-20 09:40:20.727638 IP A > Rocky8: ICMP echo reply, id 11, seq
> > 81, length 64
> > 2023-09-20 09:40:21.727638 IP ...
> >
> > On Wed, Sep 20, 2023 at 8:59 PM Isaac Boukris  wrote:
> > >
> > > On Tue, Sep 19, 2023 at 9:00 PM Stephen Hemminger
> > >  wrote:
> > > >
> > > > On Tue, 19 Sep 2023 19:35:55 +0300
> > > > Isaac Boukris  wrote:
> > > >
> > > > > Looking with git log, i found the original line was:
> > > > > return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
> > > > >
> > > > > Testing that does show a wrapping issue, e.g. (it stays around 08:05).
> > > > >
> > > > > 2023-09-19 08:05:24.372037 IP _gateway.domain > Rocky8.38358: 31975
> > > > > NXDomain 0/0/0 (46)  10
> > > > > 2023-09-19 08:05:21.577497 ARP, Request who-has _gateway tell Rocky8,
> > > > > length 46
> > > > > 2023-09-19 08:05:21.577599 ARP, Reply _gateway is-at 00:50:56:f8:92:76
> > > > > (oui Unknown), length 46 13
> > > > > 2023-09-19 08:05:22.833897 IP 192.168.202.1.50886 >
> > > > > 239.255.255.250.ssdp: UDP, length 174
> > > > >
> > > > > However with my change it looks fine and always increments. I dropped
> > > > > all the parenthesis:
> > > > > return pcapng_time.ns + delta / pcapng_time.tsc_hz * NSEC_PER_SEC;
> > > >
> > > > The issue is that timestamping is in the fast path and that 64 bit 
> > > > divide is slow.
> > > > Looking at other alternatives.
> > >
> > > Then perhaps we can keep the division optimization and just get rid of
> > > the overflow check, relying on the change to multiply by NSEC_PER_SEC
> > > after the division.
> > >
> > > With the below change only the first packet is from 2257 while all
> > > subsequent packets are fine. But if I keep the overflow check and only
> > > change to multiply after the division, then all packets are shown from
> > > 2257.
> > >
> > > [admin@Rocky8 dpdk]$ git diff lib/pcapng/rte_pcapng.c
> > > diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
> > > index 80d08e1..fa545cd 100644
> > > --- a/lib/pcapng/rte_pcapng.c
> > > +++ b/lib/pcapng/rte_pcapng.c
> > > @@ -79,7 +79,7 @@ static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
> > >  * Currently all TSCs operate below 5GHz.
> > >  */
> > > delta = cycles - pcapng_time.cycles;
> > > -   if (unlikely(delta >= pcapng_time.tsc_hz)) {
> > > +   if (0 && unlikely(delta >= pcapng_time.tsc_hz)) {
> > > if (likely(delta < pcapng_time.tsc_hz * 2)) {
> > > delta -= pcapng_time.tsc_hz;
> > > pcapng_time.cycles += pcapng_time.tsc_hz;
> > > @@ -92,8 +92,9 @@ static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
> > > }
> > > }
> > >
> > > -   return pcapng_time.ns + rte_reciprocal_divide_u64(delta * 
> > > NSEC_PER_SEC,
> > > -
> > > _time.tsc_hz_inverse);
> > > +   return pcapng_time.ns + rte_reciprocal_divide_u64(delta,
> > > +
> > > _time.tsc_hz_inverse) * NSEC_PER_SEC;
> > >  }

This is less accurate. The TSC (CPU clock frequency) is not necessarily
an even multiple of nanoseconds.

If you want to send patches please follow the contributing guidelines
and run checkpatch on them.


Re: dumpcap: timestamp is years ahead when in pcapng format

2023-09-19 Thread Stephen Hemminger
On Tue, 19 Sep 2023 19:35:55 +0300
Isaac Boukris  wrote:

> Looking with git log, i found the original line was:
> return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
> 
> Testing that does show a wrapping issue, e.g. (it stays around 08:05).
> 
> 2023-09-19 08:05:24.372037 IP _gateway.domain > Rocky8.38358: 31975
> NXDomain 0/0/0 (46)  10
> 2023-09-19 08:05:21.577497 ARP, Request who-has _gateway tell Rocky8,
> length 46
> 2023-09-19 08:05:21.577599 ARP, Reply _gateway is-at 00:50:56:f8:92:76
> (oui Unknown), length 46 13
> 2023-09-19 08:05:22.833897 IP 192.168.202.1.50886 >
> 239.255.255.250.ssdp: UDP, length 174
> 
> However with my change it looks fine and always increments. I dropped
> all the parenthesis:
> return pcapng_time.ns + delta / pcapng_time.tsc_hz * NSEC_PER_SEC;

The issue is that timestamping is in the fast path and that 64 bit divide is 
slow.
Looking at other alternatives.


Re: rte_exit() does not terminate the program -- is it a bug or a new feature?

2023-09-18 Thread Stephen Hemminger
On Mon, 18 Sep 2023 20:23:25 +0200
Gabor LENCSE  wrote:

> Of course, I will review all my rte_exit calls... It'll take a while...
> 
> I am just curious, as I have no idea, why my old code worked all right 
> with DPDK 16.11. Has rte_exit() been changed since then?

Older versions of the DPDK did not call eal_cleanup() and did not
have service lcores. I think service lcores were new in 18.11


Re: rte_exit() does not terminate the program -- is it a bug or a new feature?

2023-09-17 Thread Stephen Hemminger
On Sun, 17 Sep 2023 21:37:30 +0200
Gabor LENCSE  wrote:

> However, l2fwd also uses the "rte_exit()" function to terminate the 
> program. The only difference is that it calls the "rte_exit()" function 
> from the main program, and I do so in a thread started by the 
> "rte_eal_remote_launch()" function.

Calling rte_exit in a thread other than main thread won't work because
the cleanup code is calling rte_eal_cleanup, and inside that it ends
up waiting for all workers.  Since the thread you are calling from
is a worker, it ends up waiting for itself.

rte_exit()
rte_eal_cleanup()
rte_service_finalize()
rte_eal_mp_wait_lcore()


void
rte_eal_mp_wait_lcore(void)
{
unsigned lcore_id;

RTE_LCORE_FOREACH_WORKER(lcore_id) {
rte_eal_wait_lcore(lcore_id);
}
}

Either service handling needs to be smarter, the rte_exit() function
check if it is called from main lcore, and/or documentation needs update.
Not a simple fix because in order to safely do the cleanup logic
all threads have to gone to a quiescent state.



Re: Concurrent invocations of the dpdk-dumpcap tool

2023-09-16 Thread Stephen Hemminger
On Sun, 17 Sep 2023 00:11:31 +0300
Isaac Boukris  wrote:

> I'm testing with the 22.11.3 version code (latest LTS) but there don't
> seem to be significant changes in master.
> 
> Given the above I had the hunch the ring names collide in shared
> memory, so I changed both the ring and the pool names in the dumpcap
> tool to include the pid, and with that the above scenario works fine
> as expected, is that a proper fix?

If you have two processes trying to capture on same interface it
isn't going to work because both will be fighting over the same
ring of packets. To make it work would require more extensive changes
where multiple copies of packet are made.


Re: rte_exit() does not terminate the program -- is it a bug or a new feature?

2023-09-15 Thread Stephen Hemminger
On Fri, 15 Sep 2023 20:28:44 +0200
Gabor LENCSE  wrote:

> Dear Stephen,
> 
> Thank you very much for your answer!
> 
> > Please get a backtrace. Simple way is to attach gdb to that process.  
> 
> I have recompiled siitperf with the "-g" compiler option and executed it 
> from gdb. When the program stopped, I pressed Ctrl-C and issued a "bt" 
> command, but of course, it displayed the call stack of the main thread. 
> Then I collected some information about the threads using the "info 
> threads" command and after that I switched to all available threads, and 
> issued a "bt" command for those that represented my send() and receive() 
> functions (I identified them using their LWP number). Here are the results:
> 
> root@x033:~/siitperf# gdb ./build/siitperf-tp
> GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
> Copyright (C) 2022 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
>      .
> 
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./build/siitperf-tp...
> (gdb) set args 84 800 60 2000 2 2
> (gdb) run
> Starting program: /root/siitperf/build/siitperf-tp 84 800 60 2000 2 2
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> EAL: Detected CPU lcores: 56
> EAL: Detected NUMA nodes: 4
> EAL: Detected shared linkage of DPDK
> [New Thread 0x749c0640 (LWP 24747)]
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> [New Thread 0x741bf640 (LWP 24748)]
> EAL: Selected IOVA mode 'PA'
> EAL: No free 2048 kB hugepages reported on node 0
> EAL: No free 2048 kB hugepages reported on node 1
> EAL: No free 2048 kB hugepages reported on node 2
> EAL: No free 2048 kB hugepages reported on node 3
> EAL: No available 2048 kB hugepages reported
> EAL: VFIO support initialized
> [New Thread 0x739be640 (LWP 24749)]
> [New Thread 0x731bd640 (LWP 24750)]
> [New Thread 0x729bc640 (LWP 24751)]
> [New Thread 0x721bb640 (LWP 24752)]
> EAL: Probe PCI driver: net_ice (8086:159b) device: :98:00.0 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> EAL: Probe PCI driver: net_ice (8086:159b) device: :98:00.1 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> [New Thread 0x719ba640 (LWP 24753)]
> TELEMETRY: No legacy callbacks, legacy socket not created
> ice_set_rx_function(): Using AVX2 Vector Rx (port 0).
> ice_set_rx_function(): Using AVX2 Vector Rx (port 1).
> Info: Left port and Left Sender CPU core belong to the same NUMA node: 2
> Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2
> Info: Right port and Right Sender CPU core belong to the same NUMA node: 2
> Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2
> Info: Testing initiated at 2023-09-15 18:06:05
> Reverse frames received: 394340224
> Forward frames received: 421381420
> Info: Forward sender's sending took 70.3073795726 seconds.
> EAL: Error - exiting with code: 1
>    Cause: Forward sending exceeded the 60.000600 seconds limit, the 
> test is invalid.
> Info: Reverse sender's sending took 74.9384769772 seconds.
> EAL: Error - exiting with code: 1
>    Cause: Reverse sending exceeded the 60.000600 seconds limit, the 
> test is invalid.
> ^C
> Thread 1 "siitperf-tp" received signal SIGINT, Interrupt.
> 0x77d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> (gdb) bt
> #0  0x77d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #1  0x5559929e in Throughput::measure (this=0x7fffe300, 
> leftport=0, rightport=1) at throughput.cc:3743
> #2  0x7b20 in main (argc=7, argv=0x7fffe5b8) at 
> main-tp.cc:34
> (gdb) info threads
>    Id   Target Id   Frame
> * 1    Thread 0x777cac00 (LWP 24744) "siitperf-tp" 
> 0x77d99dd2 in rte_eal_wait_lcore ()
>     from /lib/x86_64-linux-gnu/librte_eal.so.22
>    2    Thread 0x749c0640 (LWP 24747) "eal-intr-thread" 
> 0x77a32fde in epoll_wait (epfd=6, events=0x749978d0,
>      maxevents=3, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
>    3    Thread 0x741bf640 (LWP 24748) "rte_mp_handle" 
> __recvmsg_syscall (flags=0, msg=0x741965c0, fd=9)
>      at 

Re: rte_exit() does not terminate the program -- is it a bug or a new feature?

2023-09-15 Thread Stephen Hemminger
On Fri, 15 Sep 2023 10:24:01 +0200
Gabor LENCSE  wrote:

> Dear DPDK Developers and Users,
> 
> I have met the following issue with my RFC 8219 compliant SIIT and 
> stateful NAT64/NAT44 tester, siitperf: 
> https://github.com/lencsegabor/siitperf
> 
> Its main program starts two sending threads and two receiving threads on 
> their exclusively used CPU cores using the rte_eal_remote_launch() 
> function, e.g., the code is as follows:
> 
>    // start left sender
>    if ( rte_eal_remote_launch(send, , cpu_left_sender) )
>      std::cout << "Error: could not start Left Sender." << 
> std::endl;
> 
> When the test frame sending is finished, the senders check the sending 
> time, and if the allowed time was significantly exceeded, the sender 
> gives an error message and terminates (itself and also the main program) 
> using the rte_exit() function.
> 
> This is the code:
> 
>    elapsed_seconds = (double)(rte_rdtsc()-start_tsc)/hz;
>    printf("Info: %s sender's sending took %3.10lf seconds.\n", side, 
> elapsed_seconds);
>    if ( elapsed_seconds > duration*TOLERANCE )
>      rte_exit(EXIT_FAILURE, "%s sending exceeded the %3.10lf seconds 
> limit, the test is invalid.\n", side, duration*TOLERANCE);
>    printf("%s frames sent: %lu\n", side, sent_frames);
> 
>    return 0;
> 
> The above code worked as I expected, while I used siitperf under Debian 
> 9.13 with DPDK 16.11.11-1+deb9u2. It always displayed the execution time 
> of test frame sending, and if the allowed time was significantly exceed, 
> then it gave an error message, and it was terminated, thus the sender 
> did not print out the number of send frames. And also the main program 
> was terminated due to the call of this function: it did not write out 
> the "Info: Test finished." message.
> 
> However, when I updated siitperf to use it with Ubuntu 22.04 with DPDK 
> version "21.11.3-0ubuntu0.22.04.1 amd64", then I experienced something 
> rather strange:
> 
> In the case, when the sending time is significantly exceeded, I get the 
> following messages from the program (I copy here the full output, as it 
> may be useful):
> 
> root@x033:~/siitperf# cat temp.out
> EAL: Detected CPU lcores: 56
> EAL: Detected NUMA nodes: 4
> EAL: Detected shared linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: No free 2048 kB hugepages reported on node 0
> EAL: No free 2048 kB hugepages reported on node 1
> EAL: No free 2048 kB hugepages reported on node 2
> EAL: No free 2048 kB hugepages reported on node 3
> EAL: No available 2048 kB hugepages reported
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_ice (8086:159b) device: :98:00.0 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> EAL: Probe PCI driver: net_ice (8086:159b) device: :98:00.1 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> TELEMETRY: No legacy callbacks, legacy socket not created
> ice_set_rx_function(): Using AVX2 Vector Rx (port 0).
> ice_set_rx_function(): Using AVX2 Vector Rx (port 1).
> Info: Left port and Left Sender CPU core belong to the same NUMA node: 2
> Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2
> Info: Right port and Right Sender CPU core belong to the same NUMA node: 2
> Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2
> Info: Testing initiated at 2023-09-15 07:50:17
> EAL: Error - exiting with code: 1
>    Cause: Forward sending exceeded the 60.000600 seconds limit, the 
> test is invalid.
> EAL: Error - exiting with code: 1
>    Cause: Reverse sending exceeded the 60.000600 seconds limit, the 
> test is invalid.
> root@x033:~/siitperf#
> 
> The rte_exit() function seems to work, as the error message appears, and 
> the number of sent frames is not displayed, however, the "Info: ..." 
> message about the sending time (printed out earlier in the code) is 
> missing! This is rather strange!
> 
> What is worse, the program does not stop, but *the sender threads and 
> the main program remain running (forever)*.
> 
> Here is the output of the "top" command:
> 
> top - 07:54:24 up 1 day, 14:12,  2 users, load average: 3.02, 2.41, 2.10
> Tasks: 591 total,   2 running, 589 sleeping,   0 stopped,   0 zombie
> %Cpu0  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu1  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu2  :  0.0 us,  0.0 sy,  0.0 ni, 94.1 id,  0.0 wa,  0.0 hi, 5.9 si,  
> 0.0 st
> %Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu7  :  0.0 

Re: Can independent dpdk applications in 2 separate pod use same cpu.

2023-09-09 Thread Stephen Hemminger
On Sat, 9 Sep 2023 13:04:23 + (UTC)
Amy Smith  wrote:

> Hi,I have 2 independent dpdk application pods working now using different set 
> of cpu cores. For low cost use case I would like them to use same cpu. I have 
> 1 cpu core which I want both pods to share. Is it possible? Do I need to make 
> any changes to applications to use non eal thread, changes to scheduling 
> thread etc.Thanks!

Possible yes, but will kill performance since most DPDK applications do pure 
polling.
What will happen is one application will run until preempted by the kernel 
scheduler, then the other one.
The preemption happens on clock tick so each one will run until the clock 
interrupt (usually 4 ms).
And the overhead of context switching and cache misses.



Re: EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list

2023-09-08 Thread Stephen Hemminger
On Fri, 8 Sep 2023 14:31:22 + (UTC)
Don Trotter  wrote:

>  On Thursday, September 7, 2023 at 09:26:34 PM CDT, jiangheng (G) 
>  wrote:
>  
>  
>  
> Hi
>  
> I've had the same error. It may be caused by this parameter:
>  
> https://github.com/DPDK/dpdk/blob/v22.11/config/rte_config.h#L33C9-L33C33
>  
> If you use 2MB hugepages, the max memory size is 2MB * 8192 = 16GB. This 
> error will occur when you allocate more than 16GB of memory per numa.
>  
>   
>  
> But should not have this error if you use 512 MB of memory, can you show 
> RTE_MAX_MEMSEG_PER_LIST in your dpdk?
>  
> Check whether the other sizes hugepage  is used, Theoretically, this error 
> indicates that the rte_eal_init init fails and the program should exit.
>    My initial statement was that the log  was seen "from EAL during EAL 
> init", but that is not right. The call to rte_eal_init() is successful. Later 
> I allocate the memory, using rte_mempool_create("node_pool", (128 * 1024 * 
> 1024 - 1), 256, 256, 0, NULL, NULL, NULL, NULL, rte_socket_id(), 0). That is 
> when I see the message. And the call to allocate the memory is successful, 
> and I am processing packets and allocating nodes just fine. Here is the 
> information from rte_config.h.
> $ grep RTE_MAX_MEM  
> ./cn98xx-release-output/build/dpdk/config/rte_config.h#define 
> RTE_MAX_MEMSEG_LISTS 128#define RTE_MAX_MEMSEG_PER_LIST 8192#define 
> RTE_MAX_MEM_MB_PER_LIST 32768#define RTE_MAX_MEMSEG_PER_TYPE 32768#define 
> RTE_MAX_MEM_MB_PER_TYPE 65536#define RTE_MAX_MEMZONE 2560
>  
>   
>  
> 发件人: Don Trotter 
> 发送时间: 2023年9月8日 5:33
> 收件人: Stephen Hemminger 
> 抄送: users@dpdk.org
> 主题: Re: EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list
>  
>   
>  
> On Thursday, September 7, 2023 at 03:21:12 PM CDT, Stephen Hemminger 
>  wrote:
>  
>   
>  
>   
>  
> On Thu, 7 Sep 2023 19:58:35 + (UTC)
>  
> 
> Don Trotter  wrote:
> 
> >  To clarify, the log message when my application called 
> >rte_mempool_create() to create the "node_pool", and the call succeeded.
> > Thanks,Don Trotter
> >    On Thursday, September 7, 2023 at 01:54:08 PM CDT, Don Trotter 
> > wrote: 
> > 
> >  Hi,
> > I have recently started using DPDK. I am working on a project on OcteonTX2 
> > with DPDK 11.23.01. I am seeing this message from EAL during EAL init, but 
> > everything is working.
> >     EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list
> > The system has 96GB of memory.  These 2 pools get created and everything 
> > works fine.
> > mempool @0x13fed3e00  flags=10  socket_id=0  pool=0x114030  
> > iova=0x13fed3e00  nb_mem_chunks=1  size=65535  populated_size=65535  
> > header_size=128  elt_size=10200  trailer_size=40  total_obj_size=10368  
> > private_data_size=128  ops_index=0  ops_name:   avg 
> > bytes/object=10368.558602
> > mempool @0x1575d8180  flags=10  socket_id=-1  pool=0x19d00  
> > iova=0x1575d8180  nb_mem_chunks=2  size=134217727  populated_size=134217727 
> >  header_size=128  elt_size=256  trailer_size=0  total_obj_size=384  
> > private_data_size=0  ops_index=3  ops_name:   avg 
> > bytes/object=384.94
> > You read that right. I’ve got 128*1024*1024 256 byte buffers created for 
> > data.
> > I also see there is still heap left, although pretty low after.
> > Heap id:0        Heap name:socket_0        Heap_size:55834574848,        
> > Free_size:2403644544,        Alloc_size:53430930304,        
> > Greatest_free_size:536870016,        Alloc_count:293,        Free_count:5,
> > Linux free stats.
> > tmp# free -h -w              total        used        free      shared     
> > buffers       cache   availableMem:           95Gi        88Gi       6.5Gi  
> >      444Mi          0B       470Mi       599MiSwap:            0B          
> > 0B          0B
> > Is there anything wrong with that EAL log? Is there a lurking problem?
> > Thanks,Don Trotter  
>  
> 
> >
> >   
> 
> Did you setup hugepages?
> How many and what size?
> Is this a NUMA system?
> 
> Also 11.23.01 seems like a weird release number.
> The DPDK release numbering scheme is year followed by month. I.e. 22.11 was 
> released in November of 2022
>  
>   
>  
>   
>  
> # cat /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages
>  
> 176
>  
>   
>  
> Yes it is a NUMA system.
>  
>   
>  
> You are correct sir. The SDK is SDK11.23.01 (2023-01) and DPDK is 22.11.
>  
>   
>  
> Thanks, Don Trotter
>  
>   
> 
>

You may need to add hugepages on each NUMA node.


Re: EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list

2023-09-07 Thread Stephen Hemminger
On Thu, 7 Sep 2023 19:58:35 + (UTC)
Don Trotter  wrote:

>  To clarify, the log message when my application called rte_mempool_create() 
> to create the "node_pool", and the call succeeded.
> Thanks,Don Trotter
> On Thursday, September 7, 2023 at 01:54:08 PM CDT, Don Trotter 
>  wrote:  
>  
>  Hi,
> I have recently started using DPDK. I am working on a project on OcteonTX2 
> with DPDK 11.23.01. I am seeing this message from EAL during EAL init, but 
> everything is working.
>     EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list
> The system has 96GB of memory.  These 2 pools get created and everything 
> works fine.
> mempool @0x13fed3e00  flags=10  socket_id=0  pool=0x114030  
> iova=0x13fed3e00  nb_mem_chunks=1  size=65535  populated_size=65535  
> header_size=128  elt_size=10200  trailer_size=40  total_obj_size=10368  
> private_data_size=128  ops_index=0  ops_name:   avg 
> bytes/object=10368.558602
> mempool @0x1575d8180  flags=10  socket_id=-1  pool=0x19d00  
> iova=0x1575d8180  nb_mem_chunks=2  size=134217727  populated_size=134217727  
> header_size=128  elt_size=256  trailer_size=0  total_obj_size=384  
> private_data_size=0  ops_index=3  ops_name:   avg 
> bytes/object=384.94
> You read that right. I’ve got 128*1024*1024 256 byte buffers created for data.
> I also see there is still heap left, although pretty low after.
> Heap id:0        Heap name:socket_0        Heap_size:55834574848,        
> Free_size:2403644544,        Alloc_size:53430930304,        
> Greatest_free_size:536870016,        Alloc_count:293,        Free_count:5,
> Linux free stats.
> tmp# free -h -w              total        used        free      shared     
> buffers       cache   availableMem:           95Gi        88Gi       6.5Gi    
>    444Mi          0B       470Mi       599MiSwap:            0B          0B   
>        0B
> Is there anything wrong with that EAL log? Is there a lurking problem?
> Thanks,Don Trotter
> 
>   

Did you setup hugepages?
How many and what size?
Is this a NUMA system?

Also 11.23.01 seems like a weird release number.
The DPDK release numbering scheme is year followed by month. I.e. 22.11 was 
released in November of 2022


Re: rte_rdtsc() - what is the performance impact of using rte_rdtsc() time

2023-09-05 Thread Stephen Hemminger
On Tue, 29 Aug 2023 20:25:54 +0530
Hari Haran  wrote:

> Hi All,
> 
> Subject: rte_rdtsc() - what is the performance impact of using rte_rdtsc()
> time under lcore thread while(1)
> 
> Requirement:
> 
>1. Store the packet received timestamp - based on it packets will be
>removed from buffer if it exceeds the threshold timer of buffer
>2. Two threads are available, One is lcore(dedicated core) and another
>is pthread(not a dedicated core. In pthread, have to get the timestamp of
>last received packet timestamp
> 
> 
> Query:
> 
>1. For every packet reception in lcore thread under while(1), will get
>the packet received timestamp using  rte_rdtsc() function. Whether usage of
>rte_rdtsc() function adds more delay in packet processing?
>2. Is there any way to convert rte_rdtsc() timestamp value to current
>system time in pthread()? In pthread, the last packet received time needed
>in the form of system time.
> 
> 
> Thanks in advance.
> 
> Regards,
> Hariharan

The problem is that rte_rdtsc() stops speculative execution so doing
lots of TSC instructions can hurt performance.

To correlate TSC timestamp to system time, you need to compute the offsets
once at startup. Alternatively, don't use rte_rdtsc() and instead use
clock_gettime() with the monotonic timer and the C library does the calculation
for you.


Re: vfio module crash when i used dpdk secondary process send pkts

2023-09-05 Thread Stephen Hemminger
On Thu, 24 Aug 2023 09:06:44 +0800 (CST)
jinag  <15720603...@163.com> wrote:

> This is definitely a bug in the kernel, I am looking to see if there's a 
> bugfix patch in the kernel community.  but maybe the app passed an incorrect 
> parameter that caused kernel crash?
> The kernel version is 4.18.0-147.5.1.9, it is based on upstream linux at 
> version 4.18.0

Is this RHEL?
4.18 was end of life in November 2018.


Re: vfio module crash when i used dpdk secondary process send pkts

2023-08-23 Thread Stephen Hemminger
On Tue, 22 Aug 2023 09:33:39 +0800 (CST)
jinag <15720603...@163.com> wrote:

> when I use dpdk to call rte_eth_tx_burst function for sending data from the 
> secondary process, vfio will crash:
> 
> 
> PID: 60699 TASK: 8f0152235df00 CPU: 14  COMMAND: "testlstack02"
>  #0 [a7d8cecc39a8] machine_kexec at 9045d67b
>  #1 [a7d8ceec3a00] __crash_kexec at 90562e92
>  #2 [a7d8ceec3ac0] panic at 904b9b79
>  #3 [a7d8ceec3b48] oops_end at 904231fc
>  #4 [a7d8ceec3b70] remap_pfn_range at 90664772
>  #5 [a7d8ceec3c58] remap_pfn_range at 90664772
>  #6 [a7d8ceec3da0] vfio_pci_mmap_fault at c-bda821 [vfio_pci] 
>  #7 [a7d8ceec3dc0] __do_fault at 90662ee9
>  #8 [a7d8ceec3df0] do fault at 90663b2c
>  #9 [a7d8ceec3e90] __handle_mm_fault at 90663c9c
> #10 [a7d8ceec3ec0] handle_mm_fault at 9047103b
> #11 [a7d8ceec3ee0] __do_page_fault at 904712e1
> #12 [a7d8ceec3f20] do_page_fault at 9047122e
> #13 [a7d8ceec3f50] page_fault at 90e012de
> 
> 
> The crash occurs when the secondary queue of the secondary process send pkts. 
> I have tested that the primary process and the first queue of the secondary 
> process do not crash. I use i40e X710 nic.
> 
> 
> Has anyone ever encountered simlilar issue? please provide some ideas for 
> fixing the issue.
> Thanks!
> 
If this is a kernel crash, it is a kernel bug. No matter what application does 
VFIO in kernel
should not panic.  What kernel version are you using? Is it an upstream 
long-term-stable kernel?



Re: Help Running Example

2023-08-08 Thread Stephen Hemminger
On Tue, 8 Aug 2023 11:31:52 -0400
Alan Beadle  wrote:

> Here is how I checked what other devices are in the same group as the NIC:
> 
> I ran this command as root:
> dmesg|egrep group|awk '{print $NF" "$0}'|sort -n
> 
> Here is an excerpt of the output showing the group that the NIC is in:
> 
> 10 [   17.029705] pci :00:1f.0: Adding to iommu group 10
> 10 [   17.029732] pci :00:1f.2: Adding to iommu group 10
> 10 [   17.029761] pci :00:1f.3: Adding to iommu group 10
> 10 [   17.029788] pci :00:1f.4: Adding to iommu group 10
> 10 [   17.029815] pci :00:1f.5: Adding to iommu group 10
> 10 [   17.029842] pci :00:1f.6: Adding to iommu group 10
> 
> 
> And here is an excerpt of the lspci output showing what each of those
> devices is:
> 
> 00:1f.0 ISA bridge: Intel Corporation C621 Series Chipset LPC/eSPI
> Controller (rev 09)
> 00:1f.2 Memory controller: Intel Corporation C620 Series Chipset
> Family Power Management Controller (rev 09)
> 00:1f.3 Audio device: Intel Corporation Device a1f0 (rev 09)
> 00:1f.4 SMBus: Intel Corporation C620 Series Chipset Family SMBus (rev 09)
> 00:1f.5 Serial bus controller [0c80]: Intel Corporation C620 Series
> Chipset Family SPI Controller (rev 09)
> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (3)
> I219-LM (rev 09)
> 
> Based on this grouping, it seems like I can't feasibly unbind all of
> these, unless I misunderstand something.
> 
> -Alan
> 
> On Tue, Aug 8, 2023 at 11:25 AM Alan Beadle  wrote:
> >
> > Thanks Stephen. It looks like my memory controller is in the same
> > IOMMU group. I assume this means I won't be able to do this with this
> > NIC?
> >
> > -Alan
> >
> > On Mon, Aug 7, 2023 at 8:26 PM Stephen Hemminger
> >  wrote:  
> > >
> > > On Mon, 7 Aug 2023 12:40:21 -0700
> > > Stephen Hemminger  wrote:
> > >  
> > > > On Sun, 6 Aug 2023 11:33:43 -0400
> > > > Alan Beadle  wrote:
> > > >  
> > > > > Hi,
> > > > >
> > > > > I need some help getting DPDK working. I am running Ubuntu 20.04 with
> > > > > a modified Linux 5.4 kernel, but I have also tried the stock Ubuntu
> > > > > 5.15 kernel with the same results.
> > > > >
> > > > > Here is my NIC info from lspci:
> > > > > 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (3)
> > > > > I219-LM (rev 09)
> > > > >
> > > > > I built and installed DPDK from source, and applied the following boot
> > > > > flags: "intel_iommu=on iommu=pt"
> > > > >
> > > > > After booting I did the following as root:
> > > > > echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> > > > > ifconfig enp0s31f6 down
> > > > > dpdk-devbind.py --bind=vfio-pci :00:1f.6
> > > > >
> > > > > All of this appeared to work.
> > > > >
> > > > > I tried running the "skeleton" example program and got the following 
> > > > > output:
> > > > > sudo ./build/basicfwd
> > > > > EAL: Detected CPU lcores: 16
> > > > > EAL: Detected NUMA nodes: 1
> > > > > EAL: Detected shared linkage of DPDK
> > > > > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> > > > > EAL: Selected IOVA mode 'VA'
> > > > > EAL: VFIO support initialized
> > > > > EAL: :00:1f.6 VFIO group is not viable! Not all devices in IOMMU
> > > > > group bound to VFIO or unbound
> > > > > EAL: Requested device :00:1f.6 cannot be used
> > > > > TELEMETRY: No legacy callbacks, legacy socket not created
> > > > > EAL: Error - exiting with code: 1
> > > > >   Cause: Error: number of ports must be even
> > > > >
> > > > > I'm not at all familiar with DPDK or VFIO. What might the problem be?
> > > > >
> > > > > -Alan  
> > > >
> > > > IOMMU groups are when multiple PCI devices share the same channel
> > > > in the IOMMU. The group is used to determine what mapping to use when
> > > > device does DMA. Since this is a security thing, devices in same IOMMU
> > > > group can not be shared between kernel and non-kernel usage.
> > > >
> > > > The IOMMU group is determined by wiring on the motherboard.
> > > > Usually it is things like multiple Ethernet ports sharing the same 
> > > > group.
> > > > But can be much more confused.
> > > >
> > > > The only option is to unbind all devices in the group before using
> > > > one with DPDK.  
> > >
> > > More info on IOMMU groups is in kernel documentation:
> > > https://www.kernel.org/doc/html/latest/driver-api/vfio.html
> > >
> > > and in this article
> > > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/sect-iommu-deep-dive
> > >   


Right you need to find a different system, use a VM or add an external NIC to 
use DPDK.


Re: Help Running Example

2023-08-07 Thread Stephen Hemminger
On Mon, 7 Aug 2023 12:40:21 -0700
Stephen Hemminger  wrote:

> On Sun, 6 Aug 2023 11:33:43 -0400
> Alan Beadle  wrote:
> 
> > Hi,
> > 
> > I need some help getting DPDK working. I am running Ubuntu 20.04 with
> > a modified Linux 5.4 kernel, but I have also tried the stock Ubuntu
> > 5.15 kernel with the same results.
> > 
> > Here is my NIC info from lspci:
> > 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (3)
> > I219-LM (rev 09)
> > 
> > I built and installed DPDK from source, and applied the following boot
> > flags: "intel_iommu=on iommu=pt"
> > 
> > After booting I did the following as root:
> > echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> > ifconfig enp0s31f6 down
> > dpdk-devbind.py --bind=vfio-pci :00:1f.6
> > 
> > All of this appeared to work.
> > 
> > I tried running the "skeleton" example program and got the following output:
> > sudo ./build/basicfwd
> > EAL: Detected CPU lcores: 16
> > EAL: Detected NUMA nodes: 1
> > EAL: Detected shared linkage of DPDK
> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> > EAL: Selected IOVA mode 'VA'
> > EAL: VFIO support initialized
> > EAL: :00:1f.6 VFIO group is not viable! Not all devices in IOMMU
> > group bound to VFIO or unbound
> > EAL: Requested device :00:1f.6 cannot be used
> > TELEMETRY: No legacy callbacks, legacy socket not created
> > EAL: Error - exiting with code: 1
> >   Cause: Error: number of ports must be even
> > 
> > I'm not at all familiar with DPDK or VFIO. What might the problem be?
> > 
> > -Alan  
> 
> IOMMU groups are when multiple PCI devices share the same channel
> in the IOMMU. The group is used to determine what mapping to use when
> device does DMA. Since this is a security thing, devices in same IOMMU
> group can not be shared between kernel and non-kernel usage.
> 
> The IOMMU group is determined by wiring on the motherboard.
> Usually it is things like multiple Ethernet ports sharing the same group.
> But can be much more confused.
> 
> The only option is to unbind all devices in the group before using
> one with DPDK.

More info on IOMMU groups is in kernel documentation:
https://www.kernel.org/doc/html/latest/driver-api/vfio.html

and in this article
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/sect-iommu-deep-dive


Re: Help Running Example

2023-08-07 Thread Stephen Hemminger
On Sun, 6 Aug 2023 11:33:43 -0400
Alan Beadle  wrote:

> Hi,
> 
> I need some help getting DPDK working. I am running Ubuntu 20.04 with
> a modified Linux 5.4 kernel, but I have also tried the stock Ubuntu
> 5.15 kernel with the same results.
> 
> Here is my NIC info from lspci:
> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (3)
> I219-LM (rev 09)
> 
> I built and installed DPDK from source, and applied the following boot
> flags: "intel_iommu=on iommu=pt"
> 
> After booting I did the following as root:
> echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> ifconfig enp0s31f6 down
> dpdk-devbind.py --bind=vfio-pci :00:1f.6
> 
> All of this appeared to work.
> 
> I tried running the "skeleton" example program and got the following output:
> sudo ./build/basicfwd
> EAL: Detected CPU lcores: 16
> EAL: Detected NUMA nodes: 1
> EAL: Detected shared linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: VFIO support initialized
> EAL: :00:1f.6 VFIO group is not viable! Not all devices in IOMMU
> group bound to VFIO or unbound
> EAL: Requested device :00:1f.6 cannot be used
> TELEMETRY: No legacy callbacks, legacy socket not created
> EAL: Error - exiting with code: 1
>   Cause: Error: number of ports must be even
> 
> I'm not at all familiar with DPDK or VFIO. What might the problem be?
> 
> -Alan

IOMMU groups are when multiple PCI devices share the same channel
in the IOMMU. The group is used to determine what mapping to use when
device does DMA. Since this is a security thing, devices in same IOMMU
group can not be shared between kernel and non-kernel usage.

The IOMMU group is determined by wiring on the motherboard.
Usually it is things like multiple Ethernet ports sharing the same group.
But can be much more confused.

The only option is to unbind all devices in the group before using
one with DPDK.


Re: How to add packet capture framework to a custom simple dpdk app

2023-07-10 Thread Stephen Hemminger
On Sun, 9 Jul 2023 10:36:53 +0600
Fuji Nafiul  wrote:

> Hi,
> I am using dpdk_v22.11.1 on ubuntu_v22.04.2. I have a simple app derived
> from skeleton and icmpecho which can reply to proper arp requests and also
> can reply to appropriate pings. Now whats the proper steps to add a packet
> capture framework like dpdk-dumpcap here as the doc didnt clearly said it,
> rather pointed out to check the testpmd.c
> 
> I simply added pdump header files, then initialized rte_pdump_init(), then
> I tried to run dpdk-dumpcap separately that was successful after running
> testpmd app but with my custom app it  failed.to run Then I
> noticed configure_rxtx_dump_callbacks() in port initialization and tried to
> add it properly in my app but failed. I am just not sure whether I am on
> the right way or not. so please help if you have already passed this.
> thanks in advance..!

The rte_pdump_init() handles registering the service to enable packet capture.
It causes the application (primary process) to listen for when secondary wants
capture.

The capture application (dpdk-dumpcap) then makes request to the primary
process. That request causes the dump callbacks to happen. The application
should not change.

One non obvious part is that the application has to be up and running
before the capture application starts.


Re: Multiple rte_launch_remote multiple times on main lcore

2023-06-27 Thread Stephen Hemminger
On Tue, 27 Jun 2023 20:20:28 +0200
Antonio Di Bacco  wrote:

> This is very useful. Anyway, just on main_lcore, could I launch many
> pthreads (with pthread_create) ?
> 
> Does this interfere with DPDK?

DPDK is not designed for random additional threads.
You could use rte_ctrl_thread_create or do the working of creating, binding 
threads to cores,
and registering; but the startup does that already.

DPDK is designed for run to completion model where there is one thread bound to 
a core (and isolated).
You can try other things, but that makes things worse not better.
performance will be worse, can deadlock, will require additional locking etc.


Re: Wrapping DPDK log messages with an application logger

2023-06-25 Thread Stephen Hemminger
On Thu, 22 Jun 2023 19:08:25 +0300
Dmitry Kozlyuk  wrote:

> Hi,
> 
> 2023-06-22 17:45 (UTC+0200), Lukáš Šišmiš:
> > I would think there could be opportunity to  pass my logging callback into
> > DPDK but I have not found it.  
> 
> FWIW, I too think a callback would be better than what DPDK offers now,
> because FILE* argument in rte_openlog_stream() is specific to C (libc).
> 
> > The only thing that I found was setting the stream (rte_openlog_stream()).  
> 
> Correct, this is what you're supposed to use.
> 
> > So I think I could use a Linux pipe to which DPDK would write to and the 
> > application would read the contents of it, parsing it into messages and 
> > logging it with the application logger.  
> 
> > Is there any alternative solution?  
> 
> https://man7.org/linux/man-pages/man3/fopencookie.3.html
> 
> It's effectively a way to pass the callback you want, just wrapped in a FILE*.


I have use fopencookie and rte_logstream to handle this in the past.
Probably the code is still there in Danos and other projects.



Re: Multiple rte_launch_remote multiple times on main lcore

2023-06-25 Thread Stephen Hemminger
On Tue, 20 Jun 2023 17:33:59 +0200
Antonio Di Bacco  wrote:

> Is it possible to launch multiple threads on the main lcore?
> Who will be in charge of scheduling those threads on the main lcore
> (main lcore is isolated)?
> 
> Not the OS I suppose.
> 
> Thank you

If you start trying to add threads like this, it will lead to
all sorts of locking problems.  When one thread gets the lock
and then gets preempted by the scheduler and another thread
(bound to same lcore) tries to acquire the lock, it will spin
and wait until the first thread is rescheduled.

DPDK was designed for dedicated threads per lcore.


Re: Is the 25G Xeon D Integrated LAN supported?

2023-06-25 Thread Stephen Hemminger
On Mon, 19 Jun 2023 02:57:44 +
"Yang, Tao Y"  wrote:

> Please see the release notes
> 
> https://doc.dpdk.org/guides/rel_notes/release_23_03.html
> Intel® platforms with Intel® NICs combinations
> o   CPU
> §  Intel® Atom™ CPU C3758 @ 2.20GHz
> §  Intel® Xeon® CPU D-1553N @ 2.30GHz
> §  Intel® Xeon® CPU E5-2680 v2 @ 2.80GHz
> §  Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz
> §  Intel® Xeon® D-1749NT CPU @ 3.00GHz
> §  Intel® Xeon® D-2796NT CPU @ 2.00GHz
> §  Intel® Xeon® Gold 6139 CPU @ 2.30GHz
> §  Intel® Xeon® Gold 6140M CPU @ 2.30GHz
> §  Intel® Xeon® Gold 6252N CPU @ 2.30GHz
> §  Intel® Xeon® Gold 6348 CPU @ 2.60GHz
> §  Intel® Xeon® Platinum 8180 CPU @ 2.50GHz
> §  Intel® Xeon® Platinum 8280M CPU @ 2.70GHz
> §  Intel® Xeon® Platinum 8380 CPU @ 2.30GHz
> §  Intel® Ethernet Connection E823-L for QSFP
> §  Firmware version: 3.12 0x80017cf4 1.3243.0
> §  Device id (pf/vf): 8086:151d / 8086:1889
> §  Driver version: 1.11.14 (ice)
> §  OS Default DDP: 1.3.30.0
> §  COMMS DDP: 1.3.40.0
> §  Wireless Edge DDP: 1.3.10.0
> 
> 
> 
> From: Tom Barbette 
> Sent: Thursday, June 15, 2023 8:39 PM
> To: users@dpdk.org
> Subject: Is the 25G Xeon D Integrated LAN supported?
> 
> 
> Hi all,
> 
> Is the 25G LAN integrated to Xeon D processors, like the Intel® Xeon® 
> D-1732TE for instance, supported by DPDK? If so, which driver handles that?
> 
> Surely Intel integrated a strip-downed version of the E810 on the SoC, but 
> I'd like a confirmation on that :)
> 
> Thanks,
> 
> Tom

Looking at the Linux source, it appears to be a variant of the ICE device.
What is the PCI id?



Re: Scheduling of multiple RX/TX queues on a single port

2023-05-30 Thread Stephen Hemminger
On Mon, 29 May 2023 23:02:45 +0800
Fengkai Sun  wrote:

> Hi list,
> 
> I'm curious how DPDK programs the NIC to receive/transmit packets when
> there are multiple queues on a single port.
> 
> As for RX, the answer might be clear.
> The NIC can only receive a packet once at a time, since the cable only
> outputs one signal (0 or 1) at a time (correct me if I'm wrong).
> Therefore the NIC can receive a packet, check it's information, and finally
> put in into the right queue via some policies, e.g. RSS, all sequentially.
> 
> However, it confuses me when it comes to TX.
> As there are multiple TX queues on the same port, the NIC must decide which
> queue to get packets from when it's idle.
> This is where scheduling lies. How does the NIC select the queue?
> Round-Robin? Does it have to enforce fairness among the queues?
> 
> I'm wondering where I can find some documentation on this issue. Thank you!

Transmit scheduling is up to the hardware (not DPDK).
Generally I assume it is round-robin
but there maybe cases like priority queues (like DCB) or large packets
with segment offload.


Re: DPDK hugepages

2023-05-25 Thread Stephen Hemminger
On Thu, 25 May 2023 05:36:02 +
"Lombardo, Ed"  wrote:

> Hi,
> I have two DPDK processes in our application, where one process allocates 
> 1024 2MB hugepages and the second process allocates 8 1GB hugepages.
> I am allocating hugepages in a script before the application starts.  This is 
> to satisfy different configuration settings and I don't want to write to grub 
> when second DPDK process is enabled or disabled.
> 
> Script that preconditions the hugepages:
> Process 1:
> mkdir /mnt/huge
> mount -t hugetlbfs -o pagesize=2M nodev /mnt/huge
> echo  1024  > 
> /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
> 
> Process 2:
> mkdir /dev/hugepages-1024
> mount -t hugetlbfs -o pagesize=1G none /dev/hugepages-1024
> echo 8 
> >/sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
> 
> 
> Application -
> Process 1 DPDK EAL arguments:
> Const char *argv[] = { "app1", "-c", "7fc", "-n", "4", "--huge-dir", 
> "/dev/hugepages-1024", "--proc-type", "secondary"};
> 
> Process 2 DPDK EAL arguments:
> const  char *dpdk_argv_2gb[6]  = {"app1 ", "-c0x2", "-n4" , 
> "--socket-mem=2048", "--huge-dir /mnt/huge", "--proc-type primary"};
> 
> Questions:
> 
>   1.  Does DPDK support two hugepage sizes (2MB and 1GB) sharing app1?
This is a new scenario. I doubt it.

It is possible to have two processes an a common hugepage pool.


>   2.  Do I need to specify -proc-type for each Process shown above for 
> argument to the rte_eal_init()?
The problem is that DPDK uses a runtime directory to communicate.

If you want two disjoint DPDK primary processes, you need to set the runtime 
directory.

>   3.  I find the files in /dev/hugpages/rtemap_#s are not present once 
> Process 2 hugepages-1G/nr_hugepages are set to 8, but when set value to 1 the 
> /dev/hugepages/rtemap_# files (1024) are present.  I can't see how to resolve 
> this issue.  Any suggestions?
>   4.  Do I need to set -socket-mem to the total memory of both Processes, or 
> are they separately defined?  I have one NUMA node in this VM.
> 
> Thanks,
> Ed



Re: help with virtio_port

2023-05-23 Thread Stephen Hemminger
On Tue, 23 May 2023 16:46:24 +0100
Igor de Paula  wrote:

> Hi,
> I am running the DPDK version: 21.08.0 and Ubuntu 20.04.3 LTS.
> I have an application that uses KNI to interface with the kernel.
> I want to replace it with virtio_user ports as KNI will be deprecated in
> the future.
> Most of the functionality I am able to replace but there is one thing I am
> struggling with.
> In KNI we can add functions that will be called in case the network stack
> makes a request. The following code shows this:
> struct rte_kni *kni;
> struct rte_kni_conf conf;
> struct rte_kni_ops ops;
> struct rte_eth_dev_info dev_info;
> int ret;
> /* Clear conf at first */
> memset(, 0, sizeof(conf));
> conf.core_id = 0;
> memset(, 0, sizeof(ops));
> ops.port_id = ppo->id;
> ops.config_promiscusity = ippe_ppo_set_kni_promiscuous_mode;
> ops.change_mtu = ippe_ppo_set_kni_mtu;
> ops.config_network_if = ippe_ppo_set_kni_interface;
> ops.config_mac_address = ippe_ppo_set_kni_mac_address;
> kni = rte_kni_alloc(pktmbuf_pool[0], , );
> 
> 
> And there is a handle_request function supplied by KNI that calls these
> functions when need be,
> I haven't found any documentation on how to replace this functionality. I
> am no expert in how to set up and interact with the kernel stack, Some help
> on how to achieve this would be appreciated.

If you want to handle changes to kernel network device, then you
will have to build a netlink listener that monitors these changes.


Re: Is it right way to test between two modes?

2023-05-11 Thread Stephen Hemminger
On Thu, 11 May 2023 13:47:23 +0900
이재홍  wrote:

> Hello,
> I'm really sorry to ask a lot... but I want to make sure that what I did is
> right or not
> 
> I compared two modes
> 1. default poll mode
> 2. interrupt mode (In my understand.. it is kind of an event driven mode in
> dpdk)
> 
> [Test 1] - poll mode
>  $ sudo ./examples/dpdk-l3fwd -l 1-3 -n 4 -- -p 0x3 --config="(0,0,1)"
> * power: about 36W*
> 
> [Test 2] - event driven mode
>  $ sudo ./examples/dpdk-l3fwd-power -l 1-3 -n 4 -- -p 0x03
> --config="(0,0,1)" --interrupt-only
> * power: about 12W*
> 
> 
> When Idle situation, Test1 uses a core 100% and Test2 doesn't use a core a
> lot (almost 0%), so I think I can say "In idle mode, Test2 Mode(event
> driven) reduces power by 1/3 compared to poll mode." Is it correct??
> 
> BR,
> Jaehong Lee

A couple of other notes.
1. Generating real world traffic pattern is important. Testing with something
like a full line rate test of UDP is not real life. Ideally a pattern of 
multiple
real machines to see the self-correlated TCP bursts, etc.

2. Measuring actual power would require something attached to the power supply.
The CPU load is not that great a final measure.  One option would be to use
something like powertop which can look at some of the kernel internals.


Re: DPDK 22.11 - How to fix memory leak for KNI - How to debug

2023-05-08 Thread Stephen Hemminger
On Mon, 8 May 2023 09:01:41 +0300
Yasin CANER  wrote:

> Hello Stephen,
> 
> Thank you for response, it helps me a lot. I understand problem better.
> 
> After reading mbuf library (
> https://doc.dpdk.org/guides/prog_guide/mempool_lib.html)  i realized that
> 31 units allocation memory slot doesn't return to pool!

If receive burst returns 1 mbuf, the other 31 pointers in the array
are not valid. They do not point to mbufs.

> 1 unit mbuf can be freed via rte_pktmbuf_free so it can back to pool.
> 
> Main problem is that allocation doesn't return to original pool, act as
> used. So, after following rte_pktmbuf_free
> 
> function,
> i realized that there is 2 function to helps to mbufs back to pool.
> 
> These are rte_mbuf_raw_free
> 
>  and rte_pktmbuf_free_seg
> .
> I will focus on them.
> 
> If there is another suggestion, I will be very pleased.
> 
> Best regards.
> 
> Yasin CANER
> Ulak



Re: Any way to change poll mode to event driven mode dynamically

2023-05-08 Thread Stephen Hemminger
On Mon, 8 May 2023 10:05:38 +0900
이재홍  wrote:

> Hi, guys!
> 
> I'm really interested in the energy-saving of DPDK apps.
> I have been thinking of lots of ideas and now what I want to do is
> change two modes(polling, event) dynamically.
> Is there any way to do that?
> I think it should share the same port in two modes
> 
> BR,
> Jaehong Lee

Look up receive interrupts. There is an example in l3fwd-power


Re: DPDK 22.11 - How to fix memory leak for KNI - How to debug

2023-05-04 Thread Stephen Hemminger
On Thu, 4 May 2023 13:00:32 +
Yasin CANER  wrote:

> In default-testing kni application works as below
> 
> 
>   1.  Call rte_kni_rx_burst function to get messages
>   2.  Then push to other KNI interface via rte_kni_tx_burst. There is no 
> memory-leak because  kni_free_mbufs is called and freed unused allocations.
> 
> On the other hand, in my scenario
> 
> 
>   1.  Call rte_kni_rx_burst func to get messages, burst_size is 32 but 1 
> packet is received from Kernel
>   2.  Then try to free all messages via rte_pktmbuf_free
>   3.  Freed 1 unit and 31 unit is not freed. memory leak
> 
> Other scenario,
> 
> 
>   1.  Call rte_kni_rx_burst func to  get messages, burst_size is 32 but 1 
> packet is received from Kernel
>   2.  Push to ethernet_device via rte_eth_tx_burst
>   3.  There is not any free operation by rte_eth_tx_burst
>   4.  Try to free via rte_pktmbuf_free
>   5.  1 unit is freed 31 unit is left in memory. Still memory leak


It looks like you are confused about the lifetime of mbufs, and the "ownership" 
of the mbuf.

When you do kni_rx_burst, one mbuf is full of data and returned. The other 31 
slots are not used.
Only the first mbuf is valid.

When mbuf is passed to another DPDK device driver for transmit. The mbuf is 
then owned by the
device. This mbuf can not be freed until the device has completed DMA and is 
transmitting it.
Also, many devices defer freeing transmit mbuf's as an optimization. There is 
some limited control
over the transmit freeing via tx_free_thresh. See the DPDK programmers guide 
for more info:
https://doc.dpdk.org/guides/prog_guide/poll_mode_drv.html



Re: Generic flow string parser

2023-04-28 Thread Stephen Hemminger
On Fri, 28 Apr 2023 17:04:46 -0700
Stephen Hemminger  wrote:

> On Fri, 28 Apr 2023 12:13:26 -0700
> Cliff Burdick  wrote:
> 
> > Hi Stephen, it would definitely not be worthwhile to repeat everything
> > that's already tested with testpmd. I was thinking that given that there
> > already is a "flow_parse" function that does almost everything needed,
> > something like that could be exposed. If we think of the testpmd flow
> > string as a sort of "IR" for string flow specification, that would allow
> > others to implement higher-level transform of a schema like JSON or YAML
> > into the testpmd language. Due to the complexity of testpmd and how it's
> > the source of true for testing flows, I think it's too great of an ask to
> > have testpmd support a new type of parsing. My only suggestion would be to
> > take what already exists and expose it in a public API that is included in
> > a DPDK install.
> > 
> > If you look at the "flow_classify" example in DPDK you can already see that
> > for that application someone had to write another flow text parser for a
> > format they made up. Instead, that example could be converted over to this
> > other API as well.  
> 
> Please don't top post.
> 
> The naming issue is that almost all libraries in DPDK start with rte_ prefix
> and the testpmd functions do not.
> 
> The flow_classify example is pretty much abandonware at this point.
> Code is not updated, other than build breakages.
> Last time I looked at it noticed lots of code reinvention useless code,
> and only supports IPv4. It really needs a rewrite.

Would rather the flow parser was rewritten as well. Doing open coded
parser is much more error prone and hard to extend versus writing the
parser in yacc/lex (ie bison/flex). 


Re: Generic flow string parser

2023-04-28 Thread Stephen Hemminger
On Fri, 28 Apr 2023 12:13:26 -0700
Cliff Burdick  wrote:

> Hi Stephen, it would definitely not be worthwhile to repeat everything
> that's already tested with testpmd. I was thinking that given that there
> already is a "flow_parse" function that does almost everything needed,
> something like that could be exposed. If we think of the testpmd flow
> string as a sort of "IR" for string flow specification, that would allow
> others to implement higher-level transform of a schema like JSON or YAML
> into the testpmd language. Due to the complexity of testpmd and how it's
> the source of true for testing flows, I think it's too great of an ask to
> have testpmd support a new type of parsing. My only suggestion would be to
> take what already exists and expose it in a public API that is included in
> a DPDK install.
> 
> If you look at the "flow_classify" example in DPDK you can already see that
> for that application someone had to write another flow text parser for a
> format they made up. Instead, that example could be converted over to this
> other API as well.

Please don't top post.

The naming issue is that almost all libraries in DPDK start with rte_ prefix
and the testpmd functions do not.

The flow_classify example is pretty much abandonware at this point.
Code is not updated, other than build breakages.
Last time I looked at it noticed lots of code reinvention useless code,
and only supports IPv4. It really needs a rewrite.


Re: Generic flow string parser

2023-04-28 Thread Stephen Hemminger
On Fri, 28 Apr 2023 17:36:51 +
Tom Barbette  wrote:

> Hi all!
> 
> I totally agree with this.
> 
> In FastClick we link against a copy of the test-pmd source code to call the 
> parser externally. We just have to patch a bit some files (see 
> https://github.com/tbarbette/fastclick/blob/main/userlevel/rte_parse.mk, and 
> used here : 
> https://github.com/tbarbette/fastclick/blob/main/lib/flowruleparser.cc). It 
> actually worked fairly well until a structure named "template" appeared, 
> which is a registered keyword in C++, and prevent compilation now even under 
> extern "C". This can be patched too but did not find the time yet.
> 
> So a clean solution would be more than nice. It's not only the 12K lines of 
> codes, it's also the "testpmd syntax" which is known, and appears in a lot of 
> examples here and there.
> 
> Given the relatively easy (but hacky) integration we have, a clean library 
> wouldn't probably be very difficult.
> 
> 
> Tom
> 
> Le 27/04/23 à 15:19, Cliff Burdick a écrit :
> Hi Thomas, testpmd has a 12,000 line parser just for taking in strings and 
> converting it to flow rules. This is obviously useful for testing flows, but 
> it also is an interface for any type of flow imaginable since this is where 
> they're tested.
> 
> Now imagine you're developing an application that allows the user to specify 
> custom flows in a config. Your only option is to make your own flow string 
> input (json, etc) and convert that to to the flow spec. This is reinventing 
> almost all of what testpmd already does, and it's extremely error-prone. I 
> think it would be very useful to have this as an API call rather than a user 
> constructing each flow by hand so that all these other applications can 
> benefit and not be worries about bugs during conversions.
> 
> 
> 
> On Thu, Apr 27, 2023, 01:37 Thomas Monjalon 
> mailto:tho...@monjalon.net>> wrote:
> 26/04/2023 07:47, David Marchand:
> > On Wed, Apr 26, 2023 at 6:47 AM Cliff Burdick 
> > mailto:shakl...@gmail.com>> wrote:  
> > >
> > > Does anyone know if a generic parser for flow strings exists anywhere? 
> > > The one inside of testpmd is ideal, but unfortunately it's self-contained 
> > > and not distributed as part of a normal DPDK install. This seems like 
> > > something that is likely reinvented over and over and it would be useful 
> > > if there was a single API to take in strings and generate flows.  
> >
> > I heard this same question in the past, but I don't remember the answer.
> > Copying Thomas and Ori who might know.  
> 
> I'm not sure how the testpmd code could help another application.
> And in general, if your application has a CLI,
> you need to integrate the flow commands in a broader context.

Exposing the parser for use, would require some renaming of functions, 
documentation
and a test suite. The testing would be the hardest part.


Re: quick question about core affinity

2023-04-26 Thread Stephen Hemminger
On Wed, 26 Apr 2023 14:20:30 +0200
Lukáš Šišmiš  wrote:

> Hi,
> 
> 
> DPDK core affinity runs your application on the selected cores. But that 
> doesn't stop other applications from running on the same cores.
> 
> To get closer to your goal of really isolating the application from 
> other processes you would need to add isolcpus to your boot parameters.
> 
> That instructs the scheduler to not use the mentioned cores. After 
> booting with this parameter you could run your DPDK application and 
> scheduler would not schedule any process to the cores that DPDK 
> application would use.
> 
> However, if you run a separate application and with the taskset command 
> pin it to the cores your DPDK application uses that will still run and 
> will be in conflict with your DPDK app.
> 
> 
> Best regards,
> 
> Lukas
> 
> On 26. 04. 23 13:37, 이재홍 wrote:
> > Hello, I'm new to DPDK
> >
> > I've tried to run samples and got a query about core affinity.
> > As I understand, if a lcore has affinity to a CPU set, it will run 
> > only on the CPU set.
> > And I thought If I run a dpdk sample with core 0-2, none process can 
> > use the core (0-2). but when I try to run a simple app(not dpdk app) 
> > with taskset command, it runs on 0, 1, 2 cores..
> >
> > what I want was if I use cores for dpdk apps none other process can 
> > access the cores.. but it seems possible..
> >
> > I've googled to find out this but I couldn't find anything I wanted.
> > Is there anyone can explain about this...?

Look up "DPDK core isolation".

More detail in here: https://www.suse.com/c/cpu-isolation-introduction-part-1/

There are multiple ways to do this, the simplest one is to set the kernel
command line so that on boot the scheduler does not use the isolated cores.
The more complex one recommended for production is to use cgroups
and systemd.

You can't isolate CPU 0. It is special and used for system interrupts etc.
In general, don't use CPU 0 for DPDK applications.

There are other performance tuning considerations such as IRQ affinity,
nohz_full and rcu isolation that are also worth looking at.





Re: dpdk-pdump cannot init tailq as secondary process

2023-04-24 Thread Stephen Hemminger
On Thu, 20 Apr 2023 12:18:15 +
postmaster  wrote:

> Hello
> 
> I follow what it is explained on that page
> 
> https://doc.dpdk.org/guides/tools/pdump.html
> 
> to call rte_pdump_init in my application (and checking the result, if not ok 
> exit with failure), but once I ran dpdk-pdump I got
> 
> 
> dpdk-pdump  -l 9 -- --pdump 'port=0,queue=*,rx-dev=/tmp/rx.pcap'
> EAL: Detected CPU lcores: 24
> EAL: Detected NUMA nodes: 1
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_1027261_2ca45105bcf34
> EAL: Selected IOVA mode 'PA'
> EAL: VFIO support initialized
> EAL: Cannot initialize tailq: RTE_FIB

Looks like pdump is not being run as a secondary process.
Try adding --proc-type secondary

Also, pdump is legacy application; please try dpdk-dumpcap instead.
Dumpcap supports more information, multiple interfaces, etc.



Re: eal-intr-thread, rte_mp_handle threads of secondary running on ISOLATED cores

2023-04-22 Thread Stephen Hemminger
On Fri, 21 Apr 2023 23:34:48 +0200
Antonio Di Bacco  wrote:

> I have a primary process whose
> eal-intr-thread
> rte_mp_handle
> threads are running correctly on "non-isolated" cores.
> 
> The primary forks a secondary whose
> eal-intr-thread
> rte_mp_handle
> threads are running on "isolated" cores.
> 
> I'm using DPDK 21.11, I believe this is a bug. What is your opinion?
> 
> BR,
> Anna.

This email is missing lots of info.

1. What is the bug you are seeing, what is the log?

2. Why are you trying to start DPDK in non-standard way?
Don't try and be creative and do things in new and different ways
unless you are trying to find new and different bugs!


Re: Issue with rte_eth_dev_start

2023-04-21 Thread Stephen Hemminger
On Fri, 21 Apr 2023 13:39:18 +0100
Igor de Paula  wrote:

> From what I can tell, dpdk calles the function  vfio_disable_msix  in the
> stop and start function. The reason it does not happen in start up is that
> I don't call stop before.
> Calling stop and then start calls the function twice. Which maybe shouldn't
> happen... Is it a bug?
> 
> On Thu, Apr 20, 2023 at 12:33 PM Igor de Paula  wrote:
> 
> > Hi,
> > I am having trouble with restarting a HW port allocated to DPDK.
> > I am running the DPDK version: 21.08.0 and Ubuntu 20.04.3 LTS.
> > The driver is : net_e100igb
> > After I start the port with no issues I try to call rte_eth_dev_stop to
> > stop it.
> > When I am ready I call rte_eth_dev_start to start it again an I get the
> > following message:
> > EAL: Error disabling MSI-X interrupts for fd 46.
> > I am not sure what this error is coming from and what it causes.
> > I found little information online. If someone could explain, I would
> > really appreciate it.
> > This is a copy print on setup with no issues:
> > 2023-04-20 11:09:51.703542: Driver: net_e1000_igb
> > 2023-04-20 11:09:51.703546: Bus Id: :01:00.0
> > 2023-04-20 11:09:51.703551: rx offload cap: 280e (92e0f)
> > 2023-04-20 11:09:51.703555: tx offload cap: 8002 (803f)
> > 2023-04-20 11:09:51.703559: NUMA Socket: 0
> > 2023-04-20 11:09:51.703563: MAC Address: b4:96:91:63:62:40
> > 2023-04-20 11:09:51.703568: Max Rx Queue: 8
> > 2023-04-20 11:09:51.703572: Max Tx Queue: 8
> > 2023-04-20 11:09:51.703577: Max Rx Descriptors: 4096
> > 2023-04-20 11:09:51.703581: Max Tx Descriptors: 4096
> > 2023-04-20 11:09:51.703585: Max Rx Packet Length: 16383
> > 2023-04-20 11:09:51.703589: Available Link Speeds: 10Mb/s 100Mb/s 1Gb/s
> > 2023-04-20 11:09:51.703608: Fixed Link Speed: Auto
> > 2023-04-20 11:09:51.703616: Fixed Duplex: Auto
> > 2023-04-20 11:09:51.703623: MTU Set to: 16383
> > 2023-04-20 11:09:51.703632: Actual Rx Descriptors: 4096
> > 2023-04-20 11:09:51.703636: Actual Tx Descriptors: 4096
> > 2023-04-20 11:09:51.703695: Set up 1 send queues
> > 2023-04-20 11:09:51.703700: Actual MTU: 16383
> > 2023-04-20 11:09:51.703704: Actual Linkspeed: 0
> > 2023-04-20 11:09:51.703708: Actual duplex: 0
> > 2023-04-20 11:09:51.703713: Successfully set port interrupt event
> > 2023-04-20 11:09:51.818085: Flow control turned off for Port 0
> > 2023-04-20 11:09:51.844179: Port 0 up and running
> > 2023-04-20 11:09:51.844303: Event type: LSC interrupt
> > 2023-04-20 11:09:51.844382: Port 0 Link Down
> >
> > And after I stop and start:
> > 2023-04-20 11:11:00.492428: Driver: net_e1000_igb
> > 2023-04-20 11:11:00.492476: Bus Id: :01:00.0
> > 2023-04-20 11:11:00.492526: rx offload cap: 280e (92e0f)
> > 2023-04-20 11:11:00.492576: tx offload cap: 8002 (803f)
> > 2023-04-20 11:11:00.492624: NUMA Socket: 0
> > 2023-04-20 11:11:00.492672: MAC Address: b4:96:91:63:62:40
> > 2023-04-20 11:11:00.492721: Max Rx Queue: 8
> > 2023-04-20 11:11:00.492770: Max Tx Queue: 8
> > 2023-04-20 11:11:00.492813: Max Rx Descriptors: 4096
> > 2023-04-20 11:11:00.492851: Max Tx Descriptors: 4096
> > 2023-04-20 11:11:00.492889: Max Rx Packet Length: 16383
> > 2023-04-20 11:11:00.492958: Available Link Speeds: 10Mb/s 100Mb/s 1Gb/s
> > 2023-04-20 11:11:00.493207: Fixed Link Speed: Auto
> > 2023-04-20 11:11:00.493301: Fixed Duplex: Full
> > 2023-04-20 11:11:00.493822: MTU Set to: 16383
> > 2023-04-20 11:11:00.493889: Actual Rx Descriptors: 4096
> > 2023-04-20 11:11:00.493936: Actual Tx Descriptors: 4096
> > 2023-04-20 11:11:00.494190: Actual MTU: 16383
> > 2023-04-20 11:11:00.494229: Actual Linkspeed: 0
> > 2023-04-20 11:11:00.494266: Actual duplex: 1
> > 2023-04-20 11:11:00.494305: Successfully set port interrupt event
> > *EAL: Error disabling MSI-X interrupts for fd 46*
> > 2023-04-20 11:11:00.603181: Flow control turned off for Port 0
> > 2023-04-20 11:11:00.629151: Port 0 up and running
> > 2023-04-20 11:11:00.629222: Event type: LSC interrupt
> > 2023-04-20 11:11:00.629273: Port 0 Link Down
> >
> >
> > Thanks,
> > Igor

Are you using Link State (LSC) or receive interrupts?
Did you start/stop the tx and rx queues. Could be a device bug where
it assumes all queues were stopped.

Also check kernel dmesg output; VFIO might print an error message there.


Re: [GRO] check whether ip_id continuity needs to be checked when two TCP packets are merged.

2023-04-19 Thread Stephen Hemminger
On Thu, 20 Apr 2023 02:30:41 +
"Hu, Jiayu"  wrote:

> Hi Cheng,
> 
> > -Original Message-
> > From: jiangheng (G) 
> > Sent: Saturday, April 15, 2023 10:46 PM
> > To: users@dpdk.org; Hu, Jiayu ; d...@dpdk.org
> > Subject: [GRO] check whether ip_id continuity needs to be checked when
> > two TCP packets are merged.
> > 
> > Hi jiayu.hu
> > 
> > It cannot be guaranteed that 16bit identification field of ip packets in the
> > same tcp stream will be continuous.
> > Please help check whether ip_id continuity needs to be checked when two
> > TCP packets are merged?
> > Seems to modify the following code, gro will aggregate better, and work
> > better:
> > 
> > diff --git a/lib/gro/gro_tcp4.h b/lib/gro/gro_tcp4.h index
> > 212f97a042..06faead7b5 100644
> > --- a/lib/gro/gro_tcp4.h
> > +++ b/lib/gro/gro_tcp4.h
> > @@ -291,12 +291,10 @@ check_seq_option(struct gro_tcp4_item *item,
> > /* check if the two packets are neighbors */
> > len = pkt_orig->pkt_len - l2_offset - pkt_orig->l2_len -
> > pkt_orig->l3_len - tcp_hl_orig;
> > -   if ((sent_seq == item->sent_seq + len) && (is_atomic ||
> > -   (ip_id == item->ip_id + 1)))
> > +   if (sent_seq == item->sent_seq + len)  
> 
> For atomic packets, the IP ID field is ignored, as it can be set in various 
> ways.
> For non-atomic packets, it follows Linux kernel tcp_gro_receive().
> 
> Is this change specific to your case? Can you give more details on why it 
> helps?

Many OS's don't change IP ID if DF bit is set.
See RFC 6864 for details
   >> The IPv4 ID field MUST NOT be used for purposes other than
  fragmentation and reassembly.


Re: Issues with basicfwd

2023-04-19 Thread Stephen Hemminger
On Wed, 19 Apr 2023 16:31:37 -0700
Verghis Koshi  wrote:

> Hi Stephen,
> Thanks for the response.
> I've tried with two virtual NICs, each bound to VFIO, as you can see
> below.
> 
> verghis@verghis-VirtualBox:~/dpdk-stable-22.11.1/build$
> ../usertools/dpdk-devbind.py --status
> 
> Network devices using DPDK-compatible driver
> 
> :00:08.0 '79c970 [PCnet32 LANCE] 2000' drv=vfio-pci unused=pcnet32
> :00:09.0 '79c970 [PCnet32 LANCE] 2000' drv=vfio-pci unused=pcnet32

Pcnet32 is not a device supported by DPDK.

To use DPDK with a hardware device, the DPDK library must have a driver
for that hardware since the library interacts directly with hardware registers
and rings. 


Re: Issues with basicfwd

2023-04-18 Thread Stephen Hemminger
On Tue, 18 Apr 2023 17:14:21 -0700
Verghis Koshi  wrote:

> It appears that the vdev_device_list is empty - isn't this where the
> probe function for VFIO lives?
> 
> rte_bus_probe, file ../lib/eal/common/eal_common_bus.c, vbus->name vdev
> vdev_probe, file ../drivers/bus/vdev/vdev.c, PROBE, _device_list
> 0x56337bb30d30
> vdev_probe, file ../drivers/bus/vdev/vdev.c, dev is (nil)
> 
> This is how I call basicfwd, am I missing something?
> 
> sudo examples/dpdk-skeleton -l 1 -n 4
> 
> Thanks,.
> 
> Verghis
> 
> 
> 
> 
> m
> 
> On Tue, Apr 18, 2023 at 10:11 AM Verghis Koshi  wrote:
> 
> > I'm having trouble running the basicfwd example and would appreciate
> > any help.
> > I'm running Linux Mint 21.1 inside VirtualBox, and I've created two
> > NICs:
> >
> > verghis@verghis-VirtualBox:~/dpdk-stable-22.11.1/build$
> > ../usertools/dpdk-devbind.py --status
> >
> > Network devices using DPDK-compatible driver
> > 
> > :00:03.0 '79c970 [PCnet32 LANCE] 2000' drv=vfio-pci unused=pcnet32
> >
> > Network devices using kernel driver
> > ===
> > :00:08.0 '82540EM Gigabit Ethernet Controller 100e' if=enp0s8
> > drv=e1000 unused=vfio-pci *Active*
> >
> > The first is bound to vfio-pci, to be used by basicfwd, and the second
> > uses the normal e1000 driver.
> > But when I run the code, it doesn't seem to see the VFIO driver at
> > all; further, it seems to think that 00:08.0
> > is using a non-kernel driver - why?
> > Here's the debug output; it should pick up the single VFIO port.  I
> > don't care about the 'even number of ports', that's
> > easy to fix.
> > My apologies if I'm overlooking something simple.
> >
> > verghis@verghis-VirtualBox:~/dpdk-stable-22.11.1/build$ sudo
> > examples/dpdk-skeleton -l 1 -n 4
> > EAL: Detected CPU lcores: 2
> > EAL: Detected NUMA nodes: 1
> > EAL: Detected static linkage of DPDK
> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> > EAL: Selected IOVA mode 'VA'
> > EAL: VFIO support initialized
> > rte_vfio_enable, file ../lib/eal/linux/eal_vfio.c, VFIO support initialized
> > 0: examples/dpdk-skeleton (rte_dump_stack+0x42) [55bed7d42d62]
> > 1: examples/dpdk-skeleton (55bed6cbb000+0x23b39f) [55bed6ef639f]
> > 2: examples/dpdk-skeleton (55bed6cbb000+0x239211) [55bed6ef4211]
> > 3: examples/dpdk-skeleton (main+0xf) [55bed70ac51f]
> > 4: /lib/x86_64-linux-gnu/libc.so.6 (7fbde5fb2000+0x29d90) [7fbde5fdbd90]
> > 5: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0x80) [7fbde5fdbe40]
> > 6: examples/dpdk-skeleton (_start+0x25) [55bed7b86055]
> > rte_bus_probe, file ../lib/eal/common/eal_common_bus.c, bus->name auxiliary
> > rte_bus_probe, file ../lib/eal/common/eal_common_bus.c, bus->name dpaa_bus
> > rte_bus_probe, file ../lib/eal/common/eal_common_bus.c, bus->name fslmc
> > rte_bus_probe, file ../lib/eal/common/eal_common_bus.c, bus->name ifpga
> > rte_bus_probe, file ../lib/eal/common/eal_common_bus.c, bus->name pci
> > pci_probe, file ../drivers/bus/pci/pci_common.c, dev->name :00:01.1
> > pci_probe, file ../drivers/bus/pci/pci_common.c, dev->name :00:02.0
> > pci_probe, file ../drivers/bus/pci/pci_common.c, dev->name :00:03.0
> > pci_probe, file ../drivers/bus/pci/pci_common.c, dev->name :00:04.0
> > pci_probe, file ../drivers/bus/pci/pci_common.c, dev->name :00:05.0
> > pci_probe, file ../drivers/bus/pci/pci_common.c, dev->name :00:06.0
> > pci_probe, file ../drivers/bus/pci/pci_common.c, dev->name :00:07.0
> > pci_probe, file ../drivers/bus/pci/pci_common.c, dev->name :00:08.0
> > rte_pci_map_device, file ../drivers/bus/pci/linux/pci.c,
> > rte_pci_device->name :00:08.0, dev->kdrv 0
> > pci_probe, file ../drivers/bus/pci/pci_common.c, dev->name :00:0d.0
> > rte_bus_probe, file ../lib/eal/common/eal_common_bus.c, bus->name vmbus
> > rte_bus_probe, file ../lib/eal/common/eal_common_bus.c, bus->name dsa
> > rte_bus_probe, file ../lib/eal/common/eal_common_bus.c, vbus->name vdev
> > vdev_probe, file ../drivers/bus/vdev/vdev.c, PROBE, _device_list
> > 0x55bed8764d30
> > vdev_probe, file ../drivers/bus/vdev/vdev.c, dev is (nil)
> > TELEMETRY: No legacy callbacks, legacy socket not created
> > main, file ../examples/skeleton/basicfwd.c, nb_ports 0
> > EAL: Error - exiting with code: 1
> >   Cause: Error: number of ports must be even
> >
> > Verghis
> >  

You need to create two virtual nic's and bind them to VFIO which
will remove from the kernel driver. Not familiar with VirtualBox config.
Is the virtual NIC in Virtual Box is not the same as virtio.
If it depends on the proprietary kernel driver, then
you are unlikely to get DPDK to work in virtual box environment.


Re: rte_eal_remote_launch or pthread_create on main lcore

2023-04-13 Thread Stephen Hemminger
On Thu, 13 Apr 2023 15:28:49 +0200
Antonio Di Bacco  wrote:

> My main lcore is sitting there just waiting for some message to do
> non-real time things.
> I would like to launch a thread on main lcore.
> I could use a pthread or rte_eal thread, which one would you recommend?
> 
> Regards,
> Anna

If you don't need the main lcore, then just dont pass SKIP_MASTER flag
or have main lcore wait for the workers to exit.


Re: ethtool-like command on ice pmd

2023-04-13 Thread Stephen Hemminger
On Thu, 13 Apr 2023 09:53:12 +0200
"Ernesto Ruffini"  wrote:

> Thank you Stephen,
> I would really like the PMD driver could implement it.
> Any idea on where to start?
> I saw the ethtool sends a message to kernel, but from there I cannot
> understand where (and how) this is implemented in the ICE driver.
> Porting it to the PMD driver would then be another topic, but at least we
> can quantify the effort

I don't work for Intel or have the hardware...
But the likely implementation in DPDK would involve adding device specific 
devargs.



Re: [dpdk-dev][dpdk-users] A problem about memory may not be all-zero allocated by rte_zmalloc_socket()

2023-04-12 Thread Stephen Hemminger
On Wed, 23 Feb 2022 15:38:09 +
Honnappa Nagarahalli  wrote:

> I have a question, does the dpdk code implement to ensure that the memory 
> initialization is 0?
> [Ruifeng] Clearing of the memory should be done by the kernel. In section 
> 3.1.4.6 of Programmer's Guide, it says: "
> Hugepages are cleared by the kernel when a file in hugetlbfs or its part is 
> mapped for the first time system-wide to prevent data leaks from previous 
> users of the same hugepage".
> http://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html#memory-mapping-discovery-and-memory-reservation
> [Yunjian] Thanks. However, hugepages are not cleared by the kernel(version 
> 4.19.90) on the ARM platform.
> [Honnappa] I think that is besides the point we are discussing. 
> rte_zmalloc_socket should be able to zero the memory every time it is called 
> (not just the first time).
> 
> I see that rte_zmalloc_socket explicitly clears the memory using memset when 
> the RTE_MALLOC_DEBUG is enabled. Have you tested with RTE_MALLOC_DEBUG 
> enabled?
> 
> 
> Thanks,
> Yunjian

Normally.
  - hugepage memory is zero'd by kernel when mapped in.  DPDK assumes this 
because the overhead
of zeroing large amounts of memory can impact application startup time.
If kernel is not zeroing, then your kernel is buggy.
  - when memory is freed by rte_free() it is set to zero before returning to 
the pool.
  - when malloc gets memory it will be zero'd

RTE_MALLOC_DEBUG changes this so that:
   - when memory is freed it gets overwritten by a poison value
   - when malloc gets memory it will zero it.



Re: Virtio_user as Exception Path insted of rte_kni on virtio driver is possible

2023-04-12 Thread Stephen Hemminger
On Fri, 7 Apr 2023 13:49:47 +
Yasin CANER  wrote:

> Hello all,
> 
> I would like to run a DPDK application on virtio driver that is in a 
> Ubuntu-20 VM.
> 
> 
>   1.  Can DPDK-22.11 create a virtio_user on virtio driver? Or is there a 
> another way to create KNI? I could not create VF .
>   2.  Is there a way to run virtio_user as Exception path via igb_uio instead 
> of VFIO?
>   3.  Or do i have to run via rte_kni?

Virtio has two sides, the host and guest side. Virtio_user has DPDK place the 
role of host
and the kernel device is the guest side.  If running in a VM, the usual case is 
that
the virtio device is managed by the kernel (host side) and the DPDK application 
is using
the guest side.

> 
> I try to follow VFIO guide to create VF but it doesnt work. There is no srvio 
> support in virtio driver as expected.
> 
> Best regards.
> 
> This parts not possible to run
> 
> echo 1 | sudo tee /sys/module/vfio_pci/parameters/enable_sriov
> 
> echo 2 > /sys/bus/pci/devices/:86:00.0/sriov_numvfs

SRIOV VF is a host (not guest VM) side feature.
So the above lines don't make any sense.


> DPDK version 22.11
> Ubuntu 20.04.5
> 5.4.0-146-generic
> https://doc.dpdk.org/guides-22.11/howto/virtio_user_as_exception_path.html#virtio-user-as-exception-path
> 55. Tun|Tap Poll Mode Driver - Data Plane Development Kit 22.11.1 
> documentation (dpdk.org)
> https://doc.dpdk.org/guides-22.11/linux_gsg/linux_drivers.html#linux-gsg-linux-drivers
> https://docs.kernel.org/driver-api/vfio.html
> 
> ethtool -i ens6
> driver: virtio_net
> version: 1.0.0
> firmware-version:
> expansion-rom-version:
> bus-info: :00:06.0
> supports-statistics: yes
> supports-test: no
> supports-eeprom-access: no
> supports-register-dump: no
> supports-priv-flags: no
> 

What is kernel log (dmesg) output?


Re: ethtool-like command on ice pmd

2023-04-12 Thread Stephen Hemminger
On Wed, 5 Apr 2023 18:01:47 +0200
"Ernesto Ruffini"  wrote:

> Is there a way to do it from inside DPDK?

Looking at code for DPDK driver for ice, the answer is no.


Re: How to use --vdev Options for ./dpdk-l3fwd?

2023-04-11 Thread Stephen Hemminger
On Tue, 11 Apr 2023 12:51:54 -0400
Dinesh Kumar  wrote:

> Hi Stephen ,
> 
> Thanks for your suggestions.I am able to resolve --vdev error  however I am
> having another issue related to buffer.
> .*/dpdk-l3fwd -c f -n 4 --vdev=net_tap3 -- -p 0x3
> --config="(0,0,1),(0,1,2)"*
> EAL: Detected CPU lcores: 8
> EAL: Detected NUMA nodes: 1
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: No available 1048576 kB hugepages reported
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: :00:03.0 (socket
> 0)
> eth_virtio_pci_init(): Failed to init PCI device
> EAL: Requested device :00:03.0 cannot be used
> EAL: Probe PCI driver: net_iavf (8086:154c) device: :00:05.0 (socket 0)
> EAL: Probe PCI driver: net_iavf (8086:154c) device: :00:06.0 (socket 0)
> TELEMETRY: No legacy callbacks, legacy socket not created
> Neither LPM, EM, or FIB selected, defaulting to LPM
> Initializing port 0 ... Creating queues: nb_rxq=2 nb_txq=4... Port 0
> modified RSS hash function based on hardware support,requested:0xa38c
> configured:0x238c
> 
> 
> 
> *iavf_dev_init_vlan(): Failed to update vlan offloadiavf_dev_configure():
> configure VLAN failed: -95EAL: Error - exiting with code: 1  Cause: Cannot
> init mbuf pool on socket 0*
> 
> Do I need to update any parameters?
> Do I need to add a routing rule parameter?
> Any help /pointers will be really appreciated.
> FYI. I just want to testL3  forwarding via DPDK using the Tap interface The
> flow is :
> On Vm1 .*/dpdk-l3fwd will create a Tap interface and then I will link this
> interface with a network namespace and then ping a destination address via
> the DPDK application running on VM1 and get captured on DPDK application
> running on another VM2 and I am stuck with creation Tap interface on VM1
> via *  .
> */dpdk-l3fwd.*
> Regards,
> Dinesh Kumar
> 
> 
> 
> On Mon, Apr 10, 2023 at 7:08 PM Stephen Hemminger <
> step...@networkplumber.org> wrote:  
> 
> > On Mon, 10 Apr 2023 18:47:59 -0400
> > Dinesh Kumar  wrote:
> >  
> > > Hi There,
> > > I am new to the DPDK example and having issue with using  --vdev options
> > > for DPDK example and it is throwing the below error.
> > >
> > > ./dpdk-l3fwd --log-level *:debug -c f -n 4 -- -p 0x3 --vdev
> > > 'net_pcap0,rx_pcap=input.pcap,tx_pcap=output.pcap'
> > >
> > > ---
> > > iavf_check_api_version(): Peer is supported PF host
> > > iavf_read_msg_from_pf(): Can't read msg from AQ
> > > iavf_read_msg_from_pf(): AQ from pf carries opcode 3, retval 0
> > > iavf_dev_alarm_handler(): ICR01_ADMINQ is reported
> > > iavf_handle_pf_event_msg(): VIRTCHNL_EVENT_LINK_CHANGE event
> > > iavf_handle_virtchnl_msg(): adminq response is received, opcode = 26
> > > EAL: lib.telemetry log level changed from disabled to debug
> > > TELEMETRY: Attempting socket bind to path
> > > '/var/run/dpdk/rte/dpdk_telemetry.v2'
> > > TELEMETRY: Socket creation and binding ok
> > > TELEMETRY: Telemetry initialized ok
> > > TELEMETRY: No legacy callbacks, legacy socket not created
> > > *./dpdk-l3fwd: unrecognized option '--vdev'*
> > > ./dpdk-l3fwd [EAL options] -- -p PORTMASK [-P] [--lookup] --config
> > > (port,queue,lcore)[,(port,queue,lcore)] [--rx-queue-size NPKTS]
> > > [--tx-queue-size NPKTS] [--eth-dest=X,MM:MM:MM:MM:MM:MM] [--max-pkt-len
> > > PKTLEN] [--no-numa] [--hash-entry-num] [--ipv6] [--parse-ptype]
> > > [--per-port-pool] [--mode] [--eventq-sched] [--event-vector
> > > [--event-vector-size SIZE] [--event-vector-tmo NS]] [-E] [-L]
> > >
> > > please let me know if I am missing some options that need to be added  
> > with  
> > > --vdev  
> >
> >
> > DPDK options are split into the options for the DPDK infrastructure (EAL)
> > and those
> > used by the applications. They are separated by the -- option.
> >
> > In your example, the vdev option belongs to the DPDK infrastructure not
> > the application.
> > Put it it before the -- and it should work
> >  

I think you are assuming that the DPDK tap device is for using an
existing tap device. That is not correct.
The DPDK tap interface makes a tap device for its use.


Re: DPDK and full TCP-IP stack

2023-04-10 Thread Stephen Hemminger
On Mon, 10 Apr 2023 13:08:15 -0400
fwefew 4t4tg <7532ya...@gmail.com> wrote:

> https://stackoverflow.com/questions/65841190/does-dpdk-provide-a-native-tcp-ip-network-stack-implemetation
> 
> points out there's no native TCP-IP in DPDK stack until v20.11. We're at
> 23.03. Was this completed?

No.
There was never a full TCP-IP stack in DPDK.
Doing one correctly, and supporting it, is a difficult effort.
There are many one off projects doing TCP over DPDK.
And FD.IO has a TCP host stack 
https://fd.io/docs/vpp/v2101/whatisvpp/hoststack.html


Re: Measuring core frequency with dpdk

2023-04-10 Thread Stephen Hemminger
On Mon, 10 Apr 2023 16:48:14 +0200
Antonio Di Bacco  wrote:

> Is it possible to measure the core frequency using a DPDK api? Not the
> maximum or nominal frequency but the actual number of instruction
> cycles per second.
> 
> Best regards,
> Anna

The Time Stamp Counter https://en.wikipedia.org/wiki/Time_Stamp_Counter
gets incremented at the CPU clock rate. DPDK API to read the TSC
clock rate is rte_get_tsc_hz().

Internally, the DPDK determines the clock rate either by using
architecture specific information if available or simple heuristic
of number of ticks by doing a sleep().  See lib/eal/common/eal_common_timer.c
for the details.


Re: net_ring, net_memif are missing RX-missed counter

2023-03-15 Thread Stephen Hemminger
On Wed, 15 Mar 2023 20:41:06 +0300
Igor Gutorov  wrote:

> Hi,
> 
> I've noticed that net_ring and net_memif PMDs always report RX-missed
> counter as 0. Is it just a feature that is missing, or is it something that
> fundamentally cannot be implemented for these PMDs?

Rx missed is used by hardware devices to indicate the packets that
driver could not receive because of lack of resources. It has to come
from the hardware (or maybe the host in a virtual driver).

There probably is not an analogous counter in memif because it would
have to be maintained by the other side (sender) and then have a control
API for it.  If it is your application, then the sending side could
report how many times tx_burst was unable to send packets.


Re: Callback or hook into completion queues?

2023-03-07 Thread Stephen Hemminger
On Tue, 7 Mar 2023 17:07:41 -0500
fwefew 4t4tg <7532ya...@gmail.com> wrote:

> Once I call rte_eth_tx_burst() (Mellanox Connect5-LX) is there a way to
> inspect or get a callback when transmitted packets go into the NIC's
> completion queue?

Not in DPDK.

> 
> This is related to the earlier question on timestamping with rdtsc().
> Ideally I'd take the timestamp as soon (close) to the time the packet is on
> the wire.


> 
> I've looked at tx callbacks however this is invoked as the packet is about
> to go into the "hardware queue for transmission" meaning there's lot's of
> work + serialization of packet's data to electrical signals at NIC
> bandwidth to come before the packet is on the wire.
> 
> The ideal time to get run rdtsc() is when the NIC delivers a completion
> event to a CQ for packets sent.
> 
> Presumably there's something in the mlx5 driver or perhaps DMA library to
> do this?

Not that I have ever seen.


Re: DPDK22.11 on CentOS 7.x testpmd: No probed ethernet devices

2023-03-07 Thread Stephen Hemminger
On Tue, 7 Mar 2023 17:44:37 + (UTC)
Vikas Deolaliker  wrote:

> I am using dpdk 22.11 on CentOS 7.X on intel machine with two MLNX cards. 
> They are recognized by DPDK by both vfio-pci and uio-generic. I compiled it 
> using the instructions on 2. System Requirements — Data Plane Development Kit 
> 23.03.0-rc1 documentation (dpdk.org)


Mellanox (Nvidia) devices do not use VFIO or uio-generic.
Instead, they use rdma library.

https://doc.dpdk.org/guides/nics/mlx5.html

Make sure you have right libraries and drivers (including kernel).
https://doc.dpdk.org/guides/platform/mlx5.html#mlx5-common-compilation



Re: Using rdtsc to timestamp RTT of packets

2023-03-06 Thread Stephen Hemminger
On Sun, 5 Mar 2023 20:01:15 -0500
fwefew 4t4tg <7532ya...@gmail.com> wrote:

> I think rdtsc does all this. But then I read [1]:
> 
>- The TSC is not always invariant
>- And of course context switches (if a thread is not pinned to a core)
>will invalidate any time difference
>- The TSC is not incremented when the processor enters a deep sleep. I
>don't care about this because I'll turn off the power saving modes anyway

Stack Overflow is only one step better than ChatGPT in giving
partially correct answers.

TSC is almost always works well on modern processors.
The Linux kernel aligns all the TSC values for each core at boot up.
It is invariant (derived from a single clock source) unless you have some
poorly designed NUMA system.  In the past, there were some CPU's that
did bad things during suspend, but that is fixed in current generations.

Bottom line: that advice is no longer true.


Re: Dpdk test_pmd with pdump

2023-03-06 Thread Stephen Hemminger
On Wed, 1 Mar 2023 11:55:21 +0530
RANJEETH NEDUNURI  wrote:

> Hi ,
> I hope you are doing great, I have gone through your patches related to
> dpdk,I am working on dpdk, while executing dpdk testpmd and pdump we are
> getting errors.
> We are using dpdk 22.07 version and virtual nic on top of physical nic in
> Linux

This is not a supported version, please use 22.11.

> We are following this link:
> https://www.intel.com/content/www/us/en/developer/articles/technical/dpdk-packet-capture-framework.html

Intel article is out of date, it pre-dates meson build etc.

The current docs for pdump is on website:
https://doc.dpdk.org/guides/tools/pdump.html

> Error: EAL: failed to parse device "EAL: failed to parse device
> "vdev:net_pcap_rx_0" EAL: Failed to hotplug add device on primary"
> 
> And one more how to give pcap file input to testpmd.
> 
> Could you please answer this we very thankful to you.


You may find that the newer DPDK dumpcap application is easier to work with.
It has less dependencies and more options.
https://doc.dpdk.org/guides/tools/dumpcap.html



Re: rte_flow: no ability to match on packet length?

2023-03-05 Thread Stephen Hemminger
On Tue, 28 Feb 2023 15:12:31 +
Tony Hart  wrote:

> I’m trying to use the Generic Flow API (rte_flow) to match IP packets based 
> on their length (either L2, L3 or L4 lengths).
>  
> There doesn’t seem to be an item type that explicitly matches based on length 
> (RTE_FLOW_ITEM_TYPE_x).  So I’ve tried using a mask with 
> RTE_FLOW_ITEM_TYPE_IPV4 to match on the total_length field (and similar 
> attempt to match on the UDP header dgram_len field) but the NIC I’m using 
> (mlx5) returns an error (mask enables non supported bits).
>  
> Am I out of luck, or maybe missing something?
>  
> Thanks for any insights!
>  
> I’ve tried, DPDK: 20.11.7 and 22.11.1
> 
> Tony Hart | Chief Architect
> tony.h...@corero.com  

Short answer: yes, you are right there is no generic length match.

Longer answer: rte_flow is an API which is meant to provide access to the 
underlying
match features of NIC hardware. Supporting something requires that the HW/FW 
can do the
match, and that the driver writer has added (and tested) that match.

Hopefully the MLX5 experts can help answer what is possible.


Re: Multi-process limitations when using the dumpcap tool

2023-03-03 Thread Stephen Hemminger
On Sat, 4 Mar 2023 00:37:53 +0200
Isaac Boukris  wrote:

> On Sat, Mar 4, 2023 at 12:18 AM Stephen Hemminger
>  wrote:
> >
> > On Fri, 3 Mar 2023 12:33:20 +0200
> > Isaac Boukris  wrote:
> >  
> > > Hello,
> > >
> > > The dumpcap documentation points out that it runs as a secondary process,
> > > as such I was wondering whether the multi-process limitations such as the
> > > requirement to disable ASLR on both processes, and more importantly the
> > > limitations regarding the use of librte_hash, also apply when using the
> > > dumpcap tool?
> > >
> > > Thanks!  
> >
> > Dumpcap is passive and have not heard of any problems related to ASLR.  
> 
> I realized upon sending that the librte_hash limitation is likely only
> when sharing tables between processes, I guess that's what you mean by
> passive.
> 
> Thanks!

It is more about where data could be allocated. The only allocation in dumpcap
or pdump that matters is the associated mbuf pool. This is allocated out of
normal hugepage memory which is pinned in both primary/secondary at same 
address.


Re: Multi-process limitations when using the dumpcap tool

2023-03-03 Thread Stephen Hemminger
On Fri, 3 Mar 2023 12:33:20 +0200
Isaac Boukris  wrote:

> Hello,
> 
> The dumpcap documentation points out that it runs as a secondary process,
> as such I was wondering whether the multi-process limitations such as the
> requirement to disable ASLR on both processes, and more importantly the
> limitations regarding the use of librte_hash, also apply when using the
> dumpcap tool?
> 
> Thanks!

Dumpcap is passive and have not heard of any problems related to ASLR.


Re: Add more lcores to application after rte_eal_init has been called

2023-02-22 Thread Stephen Hemminger
On Wed, 22 Feb 2023 14:10:20 +0100
Antonio Di Bacco  wrote:

> I need to add some more cores to the application after the DPDK has
> already been initialised.
> Is it possible?

No

> For other resources like network cards, I managed to probe new cards
> dynamically but don't understand how I can do the same for new lcores.

You can start with lots of cores, but only use some of them in your
application. The unused queues could wait on something like pthread_condition
that is blocking.


Re: Dpdk allocates more memory, than available physically (hugepages)

2023-02-13 Thread Stephen Hemminger
On Mon, 13 Feb 2023 18:46:14 +0300
Dmitry Kozlyuk  wrote:

> 2023-02-08 03:43 (UTC+0100), Szymon Szozda:
> > Hey,
> > I'm running dpdk on a machine with 64GB of RAM. It is configured, so 16GiB
> > (16 x 1GiB chunks) of hugepage memory is reserved on boot. I was expecting
> > dpdk to consume only those 16GiB, but it seems it gets more than 30GiB of
> > virtual memory ( I base it on memory VSZ output of top command ). The
> > machine is 1 NUMA, 1 NIC. I did some debugging and I do not see any logic
> > which limits the memory consumption, basically it seems that
> > eal_dynmem_memseg_lists_init() will allocate the same amount, no matter how
> > much RAM is physically available.
> > 
> > Is it expected? How to know that setup will not crash due to
> > insufficient memory available? How to limit those memory consumption.by
> > dpdk?  
> 
> Hi,
> 
> DPDK always reserves a large chunk of virtual address space,
> but this costs almost nothing and does not need to be limited.
> Then DPDK maps and unmaps actual pages to those addresses as needed.
> DPDK does not crash if it runs out of hugepages reserved in the system
> but merely returns NULL from its allocation API (rte_malloc).
> Real memory consumption can be limited with --socket-limit EAL option.
> See also --socket-mem and -m to reserve hugepages at DPDK startup.

Hugepages are mainly used for buffers and shared driver resources.
The normal text and data are not in hugepages (unless you figure
out how to use hugelbfs). Memory consumption limiting is best
done with cgroups. But most applications die if they can't get the
memory they need.  You can see where memory is used with /proc/XXX/maps


  1   2   3   4   5   >