On Tue, Sep 14, 2021 at 7:03 PM Ananyev, Konstantin <konstantin.anan...@intel.com> wrote: > > > Hi Jerin, > > > > NOTE: This is just an RFC to start further discussion and collect the > > > feedback. > > > Due to significant amount of work, changes required are applied only to > > > two > > > PMDs so far: net/i40e and net/ice. > > > So to build it you'll need to add: > > > -Denable_drivers='common/*,mempool/*,net/ice,net/i40e' > > > to your config options. > > > > > > > > That approach was selected to avoid(/minimize) possible performance > > > losses. > > > > > > So far I done only limited amount functional and performance testing. > > > Didn't spot any functional problems, and performance numbers > > > remains the same before and after the patch on my box (testpmd, macswap > > > fwd). > > > > > > Based on testing on octeonxt2. We see some regression in testpmd and > > bit on l3fwd too. > > > > Without patch: 73.5mpps/core in testpmd iofwd > > With out patch: 72 5mpps/core in testpmd iofwd > > > > Based on my understanding it is due to additional indirection. > > From your patch below, it looks like not actually additional indirection, > but extra memory dereference - func and dev pointers are now stored > at different places.
Yup. I meant the same. We are on the same page. > Plus the fact that now we dereference rte_eth_devices[] > data inside PMD function. Which probably prevents compiler and CPU to load > rte_eth_devices[port_id].data and rte_eth_devices[port_id]. > pre_tx_burst_cbs[queue_id] > in advance before calling actual RX/TX function. Yes. > About your approach: I don’t mind to add extra opaque 'void *data' pointer, > but would prefer not to expose callback invocations code into inline function. > Main reason for that - I think it still need to be reworked to allow > adding/removing > callbacks without stopping the device. Something similar to what was done for > cryptodev > callbacks. To be able to do that in future without another ABI breakage > callbacks related part > needs to be kept internal. > Though what we probably can do: add two dynamic arrays of opaque pointers to > rte_eth_burst_api. > One for rx/tx queue data pointers, second for rx/tx callback pointers. > To be more specific, something like: > > typedef uint16_t (*rte_eth_rx_burst_t)( void *rxq, struct rte_mbuf **rx_pkts, > uint16_t nb_pkts, void *cbs); > typedef uint16_t (*rte_eth_tx_burst_t)(void *txq, struct rte_mbuf **tx_pkts, > uint16_t nb_pkts, void *cbs); > .... > > struct rte_eth_burst_api { > rte_eth_rx_burst_t rx_pkt_burst; > /**< PMD receive function. */ > rte_eth_tx_burst_t tx_pkt_burst; > /**< PMD transmit function. */ > rte_eth_tx_prep_t tx_pkt_prepare; > /**< PMD transmit prepare function. */ > rte_eth_rx_queue_count_t rx_queue_count; > /**< Get the number of used RX descriptors. */ > rte_eth_rx_descriptor_status_t rx_descriptor_status; > /**< Check the status of a Rx descriptor. */ > rte_eth_tx_descriptor_status_t tx_descriptor_status; > /**< Check the status of a Tx descriptor. */ > struct { > void **queue_data; /* point to > rte_eth_devices[port_id].data-> rx_queues */ > void **cbs; /* points to > rte_eth_devices[port_id].post_rx_burst_cbs */ > } rx_data, tx_data; > } __rte_cache_aligned; > > static inline uint16_t > rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id, > struct rte_mbuf **rx_pkts, const uint16_t nb_pkts) > { > struct rte_eth_burst_api *p; > > if (port_id >= RTE_MAX_ETHPORTS || queue_id >= > RTE_MAX_QUEUES_PER_PORT) > return 0; > > p = &rte_eth_burst_api[port_id]; > return p->rx_pkt_burst(p->rx_data.queue_data[queue_id], rx_pkts, > nb_pkts, p->rx_data.cbs[queue_id]); That works. > } > > Same for TX. > > If that looks ok to everyone, I'll try to prepare next version based on that. Looks good to me. > In theory that should avoid extra dereference problem and even reduce > indirection. > As a drawback data->rxq/txq should always be allocated for > RTE_MAX_QUEUES_PER_PORT entries, > but I presume that’s not a big deal. > > As a side question - is there any reason why rte_ethdev_trace_rx_burst() is > invoked at very last point, > while rte_ethdev_trace_tx_burst() after CBs but before actual tx_pkt_burst()? > It would make things simpler if tracng would always be done either on > entrance or exit of rx/tx_burst. exit is fine. > > > > > My suggestion to fix the problem by: > > Removing the additional `data` redirection and pull callback function > > pointers back > > and keep rest as opaque as done in the existing patch like [1] > > > > I don't believe this has any real implication on future ABI stability > > as we will not be adding > > any new item in rte_eth_fp in any way as new features can be added in > > slowpath > > rte_eth_dev as mentioned in the patch. Ack I will happy to test again after the rework and report any performance issues if any. Thaks for the great work :-) > > > > [2] is the patch of doing the same as I don't see any performance > > regression after [2]. > > > > > > [1] > > - struct rte_eth_burst_api { > > - struct rte_eth_fp { > > + void *data; > > rte_eth_rx_burst_t rx_pkt_burst; > > /**< PMD receive function. */ > > rte_eth_tx_burst_t tx_pkt_burst; > > @@ -85,8 +100,19 @@ struct rte_eth_burst_api { > > /**< Check the status of a Rx descriptor. */ > > rte_eth_tx_descriptor_status_t tx_descriptor_status; > > /**< Check the status of a Tx descriptor. */ > > + /** > > + * User-supplied functions called from rx_burst to post-process > > + * received packets before passing them to the user > > + */ > > + struct rte_eth_rxtx_callback > > + *post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT]; > > + /** > > + * User-supplied functions called from tx_burst to pre-process > > + * received packets before passing them to the driver for transmission. > > + */ > > + struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT]; > > uintptr_t reserved[2]; > > -} __rte_cache_min_aligned; > > +} __rte_cache_aligned; > > > > [2] > > https://pastebin.com/CuqkrCW4