> From: Bruce Richardson [mailto:bruce.richard...@intel.com] > Sent: Thursday, August 27, 2020 11:10 AM > > On Thu, Aug 27, 2020 at 10:40:11AM +0200, Morten Brørup wrote: > > Jeff and Ethernet API maintainers Thomas, Ferruh and Andrew, > > > > I'm hijacking this patch thread to propose a small API modification > that prevents unnecessarily performance degradations. > > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jeff Guo > > > Sent: Thursday, August 27, 2020 9:55 AM > > > > > > The limitation of burst size in vector rx was removed, since it > should > > > retrieve as much received packets as possible. And also the > scattered > > > receive path should use a wrapper function to achieve the goal of > > > burst maximizing. > > > > > > This patch set aims to maximize vector rx burst for for > > > ixgbe/i40e/ice/iavf PMDs. > > > > > > > Now I'm going to be pedantic and say that it still doesn't conform to > the rte_eth_rx_burst() API, because the API does not specify any > minimum requirement for nb_pkts. > > > > In theory, that could also be fixed in the driver by calling the non- > vector function from the vector functions if nb_pkts is too small for > the vector implementation. > > > > However, I think that calling rte_eth_rx_burst() with a small nb_pkts > is silly and not in the spirit of DPDK, and introducing an additional > comparison for a small nb_pkts in the driver vector functions would > degrade their performance (only slightly, but anyway). > > > > Actually, I'd like to see a confirmed measurement showing a slowdown > before > we discard such an option. :-)
Good point! > While I agree that using small bursts is > not > keeping with the design approach of DPDK of using large bursts to > amortize > costs and allow prefetching, there are cases where a user/app may want > a > small burst size, e.g. 4, for latency reasons, and we need a way to > support > that. > I assume that calling rte_eth_rx_burst() with nb_pkts=32 returns 4 packets if only 4 packets are available, so you would need to be extremely latency sensitive to call it with a smaller nb_pkts. I guess that high frequency trading is the only real life scenario here. > Since the path selection is dynamic, we need to either: > a) provide a way for the user to specify that they will use smaller > bursts > and so that vector functions should not be used > b) have the vector functions transparently fallback to the scalar ones > if > used with smaller bursts > > Of these, option b) is simpler, and should be low cost since any check > is > just once per burst, and - assuming an app is written using the same > request-size each time - should be entirely predictable after the first > call. > Why does everyone assume that DPDK applications are so simple that the branch predictor will cover the entire data path? I hear this argument over and over again, and by principle I disagree with it! How about c): add rte_eth_rx() and rte_eth_tx() functions for receiving/transmitting a single packet. The ring library has such functions. Optimized single-packet functions might even perform better than calling the burst functions with nb_pkts=1. Great for latency focused applications. :-) > /Bruce >