Hi Bing, > -----Original Message----- > From: Bing Zhao <bi...@nvidia.com> > Sent: Wednesday, October 7, 2020 2:22 PM > Subject: RE: [PATCH 1/4] ethdev: add hairpin bind and unbind APIs > > Hi Ori, > > > -----Original Message----- > > From: Ori Kam <or...@nvidia.com> > > Sent: Sunday, October 4, 2020 5:20 PM > > Subject: RE: [PATCH 1/4] ethdev: add hairpin bind and unbind APIs > > > > Hi Bing, > > > > PSB, > > > > Thanks, > > Ori > > > -----Original Message----- > > > From: Bing Zhao <bi...@nvidia.com> > > > Sent: Thursday, October 1, 2020 3:26 AM > > > Cc: dev@dpdk.org > > > Subject: [PATCH 1/4] ethdev: add hairpin bind and unbind APIs > > > > > > In single port hairpin mode, all the hairpin TX and RX queues > > belong > > > to the same device. After the queues are set up properly, there is > > no > > > other dependency between the TX queue and its RX peer queue. The > > > binding process that connected the TX and RX queues together from > > > hardware level will be done automatically during the device start > > > procedure. Everything required is configured and initialized > > already > > > for the binding process. > > > > > > But in two ports hairpin mode, there will be some cross- > > dependences > > > between two different ports. Usually, the ports will be > > initialized > > > serially by the main thread but not in parallel. The earlier port > > will > > > not be able to enable the bind if the following peer port is not > > yet > > > configured with HW resources. What's more, if one port is detached > > / > > > attached dynamically, it would introduce more trouble for the > > hairpin > > > binding. > > > > > > To overcome these, new APIs for binding and unbinding are added. > > > During startup, only the hairpin TX and RX peer queues will be set > > up. > > > Nothing will be done when starting the device if the queues are > > > without auto-bind attribute. Only after the required ports pair > > > started, the `rte_eth_hairpin_bind()` API can be called to bind > > the > > > all TX queues of the egress port to the RX queues of the peer port. > > > Then the connection between the egress and ingress ports pair will > > be > > > established. > > > > > > The `rte_eth_hairpin_unbind()` API could be used to disconnect the > > > egress and the peer ingress ports. This should only be called > > before > > > the device is closed if needed. When doing the clean up, all the > > > egress and ingress pairs related to a single port should be taken > > into > > > consideration. > > > > > > Signed-off-by: Bing Zhao <bi...@nvidia.com> > > > --- > > > lib/librte_ethdev/rte_ethdev.c | 107 > > > +++++++++++++++++++++++++++++++ > > > lib/librte_ethdev/rte_ethdev.h | 51 +++++++++++++++ > > > lib/librte_ethdev/rte_ethdev_driver.h | 52 +++++++++++++++ > > > lib/librte_ethdev/rte_ethdev_version.map | 2 + > > > 4 files changed, 212 insertions(+) > > > > > > diff --git a/lib/librte_ethdev/rte_ethdev.c > > > b/lib/librte_ethdev/rte_ethdev.c index dfe5c1b..72f567b 100644 > > > --- a/lib/librte_ethdev/rte_ethdev.c > > > +++ b/lib/librte_ethdev/rte_ethdev.c > > > @@ -2175,6 +2175,113 @@ rte_eth_tx_hairpin_queue_setup(uint16_t > > > port_id, uint16_t tx_queue_id, > > > return eth_err(port_id, ret); > > > } > > > > > > +int > > > +rte_eth_hairpin_bind(uint16_t tx_port, uint16_t rx_port) { > > > + struct rte_eth_dev *dev; > > > + struct rte_eth_dev *rdev; > > > + uint16_t p; > > > + uint16_t rp; > > > + int ret = 0; > > > + > > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(tx_port, -EINVAL); > > > + dev = &rte_eth_devices[tx_port]; > > > + if (!dev->data->dev_started) { > > > + RTE_ETHDEV_LOG(ERR, "TX port %d is not started", > > tx_port); > > > + return -EBUSY; > > > + } > > > + > > > + /* > > > + * If the all the ports probed belong to two or more separate > > NICs, it > > > + * is recommended that each pair is bound independently but > > not in the > > > + * loop to bind all ports. > > > + */ > > > > I don't understand your comment. > > I mean, in the following loops, if the application wants to bind one TX port > to all > the other RX ports, it would not be supported between different PMDs or > different NICs. Then the roll-back actions will be done and no hairpin port > peers > will work successfully. >
O.K. > > > > > + if (rx_port == RTE_MAX_ETHPORTS) { > > > > I think maybe this should be done in the tx queue. Since if the bind > > don't need some port why do we care if it is started? > > Do you mean the checking in the RX ports loop below? At first, I intended to > make sure this API would be called only after all the ports are started. > But yes, there is no need if some port is not being used. > I mean move the check to the tx pmd, if for some reason (peer device not started) then it should fail. > > So either add a new function to get all peer ports from the tx port, > > or move this logic to the Target PMD. > > There would be one more API to be introduced. To my understanding, it would > be more appropriate if we add a new API here. > But it would keep RTE simpler if we move such logic into the PMD. > I'm fine with moving the logic to the PMD as long as it can check the status. which he can using the internal function right? > > > > > + RTE_ETH_FOREACH_DEV(p) { > > > + rdev = &rte_eth_devices[p]; > > > + if (!rdev->data->dev_started) { > > > + RTE_ETHDEV_LOG(ERR, > > > + "RX port %d is not started", p); > > > + ret = -EBUSY; > > > + goto unbind; > > > + } > > > + ret = (*dev->dev_ops->hairpin_bind)(dev, p); > > > + if (ret) { > > > + RTE_ETHDEV_LOG(ERR, "Failed to bind hairpin > > > TX " > > > + "%d to RX %d", tx_port, p); > > > + goto unbind; > > > + } > > > + } > > > + } else { > > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(rx_port, -EINVAL); > > > + rdev = &rte_eth_devices[rx_port]; > > > + if (!rdev->data->dev_started) { > > > + RTE_ETHDEV_LOG(ERR, > > > + "RX port %d is not started", rx_port); > > > + return -EBUSY; > > > + } > > > + ret = (*dev->dev_ops->hairpin_bind)(dev, rx_port); > > > + if (ret) > > > + RTE_ETHDEV_LOG(ERR, "Failed to bind hairpin TX %d > > " > > > + "to RX %d", tx_port, rx_port); > > > + } > > > + > > > + return ret; > > > + > > > +unbind: > > > + /* Roll back the previous binding process. */ > > > + RTE_ETH_FOREACH_DEV(rp) { > > > + if (rp < p) > > > + (*dev->dev_ops->hairpin_unbind)(dev, rp); > > > + else > > > + break; > > > + } > > > + return ret; > > > +} > > > + > > > +int > > > +rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port) { > > > + struct rte_eth_dev *dev; > > > + struct rte_eth_dev *rdev; > > > + uint16_t p; > > > + int ret = 0; > > > + > > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(tx_port, -EINVAL); > > > + dev = &rte_eth_devices[tx_port]; > > > + if (!dev->data->dev_started) { > > > + RTE_ETHDEV_LOG(ERR, "TX port %d is stopped", tx_port); > > > + return -EBUSY; > > > + } > > > + > > > + if (rx_port == RTE_MAX_ETHPORTS) { > > > + RTE_ETH_FOREACH_DEV(p) { > > > + rdev = &rte_eth_devices[p]; > > > + if (!rdev->data->dev_started) { > > > > This condition should never be true. > > First see my comment above about the list of devices, second port > > should fail to stop if it is bounded. > > Thank, I will remove this check and make the logic the same as "bind" API. > When stopping the second port, PMD will check if any peer port is bound and > return a failure then. > Great we should also consider what happens in case of hot plug, In this case the hot plug port should be able to unbind himself from all ports. > > > > > + RTE_ETHDEV_LOG(ERR, "RX port %d is > > > stopped", p); > > > + ret = -EBUSY; > > > + break; > > > + } > > > + ret = (*dev->dev_ops->hairpin_unbind)(dev, p); > > > + if (ret) { > > > + RTE_ETHDEV_LOG(ERR, "Failed to unbind > > > hairpin " > > > + "TX %d from RX %d", tx_port, p); > > > + break; > > > + } > > > + } > > > + } else { > > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(rx_port, -EINVAL); > > > + rdev = &rte_eth_devices[rx_port]; > > > + if (!rdev->data->dev_started) { > > > + RTE_ETHDEV_LOG(ERR, "RX port %d is stopped", > > > rx_port); > > > + return -EBUSY; > > > + } > > > + ret = (*dev->dev_ops->hairpin_unbind)(dev, rx_port); > > > + } > > > + > > > + return ret; > > > +} > > > + > > > void > > > rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t > > unsent, > > > void *userdata __rte_unused) > > > diff --git a/lib/librte_ethdev/rte_ethdev.h > > > b/lib/librte_ethdev/rte_ethdev.h index 645a186..c3fb684 100644 > > > --- a/lib/librte_ethdev/rte_ethdev.h > > > +++ b/lib/librte_ethdev/rte_ethdev.h > > > @@ -2133,6 +2133,57 @@ int rte_eth_tx_hairpin_queue_setup > > > const struct rte_eth_hairpin_conf *conf); > > > > > > /** > > > + * @warning > > > + * @b EXPERIMENTAL: this API may change, or be removed, without > > prior > > > notice > > > + * > > > + * Bind all hairpin TX queues of one port to the RX queues of the > > peer port. > > > + * It is only allowed to call this API after all hairpin queues > > are > > > +configured > > > + * properly and the devices of TX and peer RX are in started > > state. > > > + * > > > + * @param tx_port > > > + * The TX port identifier of the Ethernet device. > > > + * @param rx_port > > > + * The peer RX port identifier of the Ethernet device. > > > + * RTE_MAX_ETHPORTS is allowed for the traversal of all devices. > > > + * RX port ID could have the same value with TX port ID. > > > + * > > > + * @return > > > + * - (0) if successful. > > > + * - (-EINVAL) if bad parameter. > > > + * - (-EBUSY) if device is not in started state. > > > + * - (-ENOTSUP) if hardware doesn't support. > > > + * - Others detailed errors from PMD drivers. > > > + */ > > > +__rte_experimental > > > +int rte_eth_hairpin_bind(uint16_t tx_port, uint16_t rx_port); > > > + > > > +/** > > > + * @warning > > > + * @b EXPERIMENTAL: this API may change, or be removed, without > > prior > > > notice > > > + * > > > + * Unbind all hairpin TX queues of one port from the RX queues of > > the > > > + peer > > > port. > > > + * This should be called before closing the TX or RX devices > > > +(optional). After > > > + * unbind the hairpin ports pair, it is allowed to bind them > > again. > > > + * Changing queues configuration should be after stopping the > > device. > > > + * > > > + * @param tx_port > > > + * The TX port identifier of the Ethernet device. > > > + * @param rx_port > > > + * The peer RX port identifier of the Ethernet device. > > > + * RTE_MAX_ETHPORTS is allowed for traversal of all devices. > > > + * RX port ID could have the same value with TX port ID. > > > + * > > > + * @return > > > + * - (0) if successful. > > > + * - (-EINVAL) if bad parameter. > > > + * - (-EBUSY) if device is in stopped state. > > > + * - (-ENOTSUP) if hardware doesn't support. > > > + * - Others detailed errors from PMD drivers. > > > + */ > > > +__rte_experimental > > > +int rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port); > > > + > > > +/** > > > * Return the NUMA socket to which an Ethernet device is > > connected > > > * > > > * @param port_id > > > diff --git a/lib/librte_ethdev/rte_ethdev_driver.h > > > b/lib/librte_ethdev/rte_ethdev_driver.h > > > index 04ac8e9..910433f 100644 > > > --- a/lib/librte_ethdev/rte_ethdev_driver.h > > > +++ b/lib/librte_ethdev/rte_ethdev_driver.h > > > @@ -575,6 +575,54 @@ typedef int (*eth_tx_hairpin_queue_setup_t) > > > const struct rte_eth_hairpin_conf *hairpin_conf); > > > > > > /** > > > + * @internal > > > + * Bind all hairpin TX queues of one port to the RX queues of the > > peer port. > > > + * > > > + * @param dev > > > + * ethdev handle of port. > > > + * @param rx_port > > > + * the peer RX port. > > > + * > > > + * @return > > > + * Negative errno value on error, 0 on success. > > > + * > > > + * @retval 0 > > > + * Success, bind successfully. > > > + * @retval -ENOTSUP > > > + * Bind API is not supported. > > > + * @retval -EINVAL > > > + * One of the parameters is invalid. > > > + * @retval -EBUSY > > > + * Device is not started. > > > + */ > > > +typedef int (*eth_hairpin_bind_t)(struct rte_eth_dev *dev, > > > + uint16_t rx_port); > > > + > > > +/** > > > + * @internal > > > + * Unbind all hairpin TX queues of one port from the RX queues of > > the > > > +peer > > > port. > > > + * > > > + * @param dev > > > + * ethdev handle of port. > > > + * @param rx_port > > > + * the peer RX port. > > > + * > > > + * @return > > > + * Negative errno value on error, 0 on success. > > > + * > > > + * @retval 0 > > > + * Success, bind successfully. > > > + * @retval -ENOTSUP > > > + * Bind API is not supported. > > > + * @retval -EINVAL > > > + * One of the parameters is invalid. > > > + * @retval -EBUSY > > > + * Device is already stopped. > > > + */ > > > +typedef int (*eth_hairpin_unbind_t)(struct rte_eth_dev *dev, > > > + uint16_t rx_port); > > > + > > > +/** > > > * @internal A structure containing the functions exported by an > > > Ethernet driver. > > > */ > > > struct eth_dev_ops { > > > @@ -713,6 +761,10 @@ struct eth_dev_ops { > > > /**< Set up device RX hairpin queue. */ > > > eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup; > > > /**< Set up device TX hairpin queue. */ > > > + eth_hairpin_bind_t hairpin_bind; > > > + /**< Bind all hairpin TX queues of device to the peer port RX > > queues. */ > > > + eth_hairpin_unbind_t hairpin_unbind; > > > + /**< Unbind all hairpin TX queues from the peer port RX queues. > > */ > > > }; > > > > > > /** > > > diff --git a/lib/librte_ethdev/rte_ethdev_version.map > > > b/lib/librte_ethdev/rte_ethdev_version.map > > > index c95ef51..18efe4e 100644 > > > --- a/lib/librte_ethdev/rte_ethdev_version.map > > > +++ b/lib/librte_ethdev/rte_ethdev_version.map > > > @@ -227,6 +227,8 @@ EXPERIMENTAL { > > > rte_tm_wred_profile_delete; > > > > > > # added in 20.11 > > > + rte_eth_hairpin_bind; > > > + rte_eth_hairpin_unbind; > > > rte_eth_link_speed_to_str; > > > rte_eth_link_to_str; > > > }; > > > -- > > > 2.5.5 > > Thanks