tc qdisc https://linux.die.net/man/8/tc
On Thu, May 7, 2020 at 3:47 AM Pavel Vajarov <frea...@gmail.com> wrote: > On Wed, May 6, 2020 at 5:55 PM Stephen Hemminger < > step...@networkplumber.org> > wrote: > > > On Wed, 6 May 2020 08:14:20 +0300 > > Pavel Vajarov <frea...@gmail.com> wrote: > > > > > Hi there, > > > > > > We are trying to compare the performance of DPDK+FreeBSD networking > stack > > > vs standard Linux kernel and we have problems finding out why the > former > > is > > > slower. The details are below. > > > > > > There is a project called F-Stack <https://github.com/F-Stack/f-stack > >. > > > It glues the networking stack from > > > FreeBSD 11.01 over DPDK. We made a setup to test the performance of > > > transparent > > > TCP proxy based on F-Stack and another one running on Standard Linux > > > kernel. > > > We did the tests on KVM with 2 cores (Intel(R) Xeon(R) Gold 6139 CPU @ > > > 2.30GHz) > > > and 32GB RAM. 10Gbs NIC was attached in passthrough mode. > > > The application level code, the one which handles epoll notifications > and > > > memcpy data between the sockets, of the both proxy applications is 100% > > the > > > same. Both proxy applications are single threaded and in all tests we > > > pinned the applications on core 1. The interrupts from the network card > > > were pinned to the same core 1 for the test with the standard Linux > > > application. > > > > > > Here are the test results: > > > 1. The Linux based proxy was able to handle about 1.7-1.8 Gbps before > it > > > started to throttle the traffic. No visible CPU usage was observed on > > core > > > 0 during the tests, only core 1, where the application and the IRQs > were > > > pinned, took the load. > > > 2. The DPDK+FreeBSD proxy was able to thandle 700-800 Mbps before it > > > started to throttle the traffic. No visible CPU usage was observed on > > core > > > 0 during the tests only core 1, where the application was pinned, took > > the > > > load. In some of the latter tests I did some changes to the number of > > read > > > packets in one call from the network card and the number of handled > > events > > > in one call to epoll. With these changes I was able to increase the > > > throughput > > > to 900-1000 Mbps but couldn't increase it more. > > > 3. We did another test with the DPDK+FreeBSD proxy just to give us some > > > more info about the problem. We disabled the TCP proxy functionality > and > > > let the packets be simply ip forwarded by the FreeBSD stack. In this > test > > > we reached up to 5Gbps without being able to throttle the traffic. We > > just > > > don't have more traffic to redirect there at the moment. So the > bottlneck > > > seem to be either in the upper level of the network stack or in the > > > application > > > code. > > > > > > There is a huawei switch which redirects the traffic to this server. It > > > regularly > > > sends arping and if the server doesn't respond it stops the > redirection. > > > So we assumed that when the redirection stops it's because the server > > > throttles the traffic and drops packets and can't respond to the arping > > > because > > > of the packets drop. > > > > > > The whole application can be very roughly represented in the following > > way: > > > - Write pending outgoing packets to the network card > > > - Read incoming packets from the network card > > > - Push the incoming packets to the FreeBSD stack > > > - Call epoll_wait/kevent without waiting > > > - Handle the events > > > - loop from the beginning > > > According to the performance profiling that we did, aside from packet > > > processing, > > > about 25-30% of the application time seems to be spent in the > > > epoll_wait/kevent > > > even though the `timeout` parameter of this call is set to 0 i.e. > > > it shouldn't block waiting for events if there is none. > > > > > > I can give you much more details and code for everything, if needed. > > > > > > My questions are: > > > 1. Does somebody have observations or educated guesses about what > amount > > of > > > traffic should I expect the DPDK + FreeBSD stack + kevent to process in > > the > > > above > > > scenario? Are the numbers low or expected? > > > We've expected to see better performance than the standard Linux kernel > > one > > > but > > > so far we can't get this performance. > > > 2. Do you think the diffrence comes because of the time spending > handling > > > packets > > > and handling epoll in both of the tests? What do I mean. For the > standard > > > Linux tests > > > the interrupts handling has higher priority than the epoll handling and > > > thus the application > > > can spend much more time handling packets and processing them in the > > kernel > > > than > > > handling epoll events in the user space. For the DPDK+FreeBSD case the > > time > > > for > > > handling packets and the time for processing epolls is kind of equal. I > > > think, that this was > > > the reason why we were able to get more performance increasing the > number > > > of read > > > packets at one go and decreasing the epoll events. However, we couldn't > > > increase the > > > throughput enough with these tweaks. > > > 3. Can you suggest something else that we can test/measure/profile to > get > > > better idea > > > what exactly is happening here and to improve the performance more? > > > > > > Any help is appreciated! > > > > > > Thanks in advance, > > > Pavel. > > > > First off, if you are testing on KVM, are you using PCI pass thru or > SR-IOV > > to make the device available to the guest directly. The default mode uses > > a Linux bridge, and this results in multiple copies and context switches. > > You end up testing Linux bridge and virtio performance, not TCP. > > > > To get full speed with TCP and most software stacks you need TCP > > segmentation > > offload. > > > > Also software queue discipline, kernel version, and TCP congestion > control > > can have a big role in your result. > > > > Hi, > > Thanks for the response. > > We did the tests on Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-96-generic > x86_64). > The NIC was given to the guest using SR-IOV. > The TCP segmentation offload was enabled for both tests (standard Linux and > DPDK+FreeBSD). > The congestion control algorithm for both tests was 'cubic'. > > What do you mean by 'software queue discipline'? > > Regards, > Pavel. > -- Regards, Dave Seddon +1 415 857 5102