On 7 August 2017 at 11:40, Raymond Burkholder <r...@oneunified.net> wrote: > > on some platforms, like linux, you need to check ‘ethtool -S’ to see if the > operating system is dropping packets (on tx or rx). which may require some > performance tuning of the network interfaces.
Yeah ethtool -C is import to set the minimum RX IRQ (NET_RX) as low as you can. Without using one of the third party libraries like Netmap, DPDK or VPP, or similar to implement Kernel bypass techniques, or a tool that uses them, you have to make lots of “tweaks” to get even a fraction of that bandwidth or pps rates. EtherateMT uses Tx and Rx ring buffers (using PACKET_MMAP_TX/PACKET_MMAP_RX), with AF_PACKET to dump the ring with a single syscall and single context switch, it forcefully increases the OS socket send/receive buffer size, it uses PACKET_QDISC_BYPASS to bypass the Linux queuing discipline sub-system (skipping and QoS configuration basically), it ignores dropped packets using PACKET_LOSS, and can use FANOUT groups to spray traffic over all Tx/Rx queues in the NIC. One can also use isolcpus and nohz_full. I have some noted on host tuning I can share if anyone is interested, I’d just need to dig them out. However even with all those, DPDK et al are still much faster. > also, on a linux platform, the kernel guys use some trace tools, one of which > will create one buffer, and copy it to the network interface, making a very > effective high bandwidth tester, with some purporting to fill a 10g link. I > don’t have the name off the top of my head. You might be thinking of pktgen (the Kernel module and not the DPDK based app!) which I believe can do 10Gbps using 64 byte packets. I think (could be wrong here) over the years that morphed into trafgen in the netsniff package: http://netsniff-ng.org/ By loading it into the kernel there is arguably one less copy from user land process into kernel memory (as is the case with sendto() for example; https://linux.die.net/man/2/sendto) and but using ring buffers one syscall can be used to send or receive many packets from the user land process into sk_buffs in Kernel memory and into DMA space. DPDK uses similar ideas but it has something called the EAL (environment abstraction layer) which can provide XSS within minimal effort from the user and it can use it can DMA directly from it’s ring buffer removing another copy-per-packet over Linux’s AF_PACKET module (as well as loads of other cool shit). VPP which builds on DPDK recently passed the 1Tbps mark (10x100Gbps interfaces with like 1M routes in FIB) using the new Intel SkyLake CPU. They have achieved a PPS budget per packet that was stupidly low, like 200 instructions per packet. > this being a cisco list, some cisco platforms have built in ttcp performance > testers. I always forget about that but I've never had a particularly great experience with it. It's there on some ISR models, I also used it on the ME3x00 switches once, but the throughput was like 20Mbps and I found it quite flaky. I think I'm hijacking this thread a bit with my own rants. Sorry about that, James. _______________________________________________ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/