Hairpin functionality only supports one single port mode (e.g testpmd
application) in the current implementation. It means that the traffic
will be sent out from the same port it comes. There is no such
restriction for some NICs, and strong demand to support two ports
hairpin mode in real-life cases.
Two ports hairpin mode does not really mean hairpin will only support
two ports in a single application. Indeed, it also needs to support
the single port hairpin today for compatibility. In the meanwhile,
'two ports' means the ingress and egress ports of the traffic could
Be different. And also, there is no restriction that
  1. traffic from the same ingress port must go to the same egress
     port
  2. traffic from the port that as 'egress' for other traffic flows
     must go to their 'ingress' port
The configuration should be flexible and the behavior of traffic will
be decided by the rte flows.

Usually, during the startup phase, all the hairpin configurations
except flows should be done. It means that hairpin TXQ and peer RXQ
should be bound together. It is feasible in single port mode and
transparent to the application. In two ports mode, there may be some
problems for the queues configuring and binding.
  1. Once TXQ & RXQ belong to different ports, it would be hard to
     configure the first port when the initialization of the second
     port is not done. Also, it is not proper to configure the first
     port during the second one starting.
  2. The port could be attached and detached dynamically. Hairpin
     between these ports should support dynamic configuration.

In two ports hairpin mode, since the TXQ and RXQ belong to different
ports. If some actions need to be done in the TX part, the egress flow
could be inserted explicitly and managed separately from the RX part.
What's more, one egress flow could be shared for different ingress
flows from the same or different ports.

In order to satisfy these, some changes on the current rte ethdev and
flow APIs are needed and some new APIs will be introduced.

1. Data structures in 'rte_ethdev.h'
Two new members are added.
struct rte_eth_hairpin_conf {
        uint16_t peer_count; /**< The number of peers. */
        struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
        uint16_t tx_explicit;
        uint16_t manual_bind;
};
'tx_explicit': If 0, PMD will help to insert the egress flow in a
implicit way. If 1, the application will insert it by itself.
'manual_bind': If 0, PMD will try to bind hairpin TXQ and RXQ peer
automatically, like in today's single port hairpin mode and this is
for backward compatibility. If 1, then manual bind API will be called.
The application should ensure there is no conflict for the hairpin
peer configurations between TX & RX as today and PMD could check
them inside. For new member 'tx_explicit', all queue pairs from one
ingress port to the same egress are suggested to have the same value
in order not to create chaos, like in RSS cases.
For new member 'manual_bind', the same suggestion is applicable.
The support for the new members will be decided by the NICs' capacity
and real-life usage from the application.

2. New macros in 'rte_ethdev.h'
RTE_ETH_HAIRPIN_BIND_AUTO (0)
RTE_ETH_HAIRPIN_BIND_MANUAL (1)
RTE_ETH_HAIRPIN_TXRULE_IMPLICIT (0)
RTE_ETH_HAIRPIN_TXRULE_EXPLICIT (1)
These are used for the new members in 'struct rte_eth_hairpin_conf'.

3. New function APIs in 'rte_ethdev.h'
* int rte_eth_hairpin_bind(uint16_t tx_port, uint16_t rx_port)
* typedef int (*eth_hairpin_bind)(struct rte_eth_dev *dev,
                                uint16_t rx_port);
This function will be used to bind one port egress to the peer port
ingress. If 'rx_port' is equal to RTE_MAX_ETHPORTS, then all the ports
will be traversed to bind hairpin egress queues to all of their
ingress queues configured. The application needs to call it repeatedly
to bind all egress ports.
This should be called after the hairpin queues are set up and devices
are started. If 'manual_bind' is not specified, no need to call this
API. A function pointer with 'eth_hairpin_bind' type should be
provided by the PMD to execute the hardware setting in the driver.
0 return value means success and a negative value will be returned to
indicate the actual failure.

* int rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port)
* typedef int (*eth_hairpin_unbind)(struct rte_eth_dev *dev,
                                    uint16_t rx_port);
This function will unbind one port egress to the peer port ingress,
only one direction hairpin will be unbound. Unbinding of the opposite
direction needs another call of this API.
If 'rx_port' is equal to RTE_MAX_ETHPORTS, all the ports will be
traversed to do the queues unbind (if any). The application needs to
call it repeatedly to unbind all egress ports.
The API could be called without stopping or closing the eth device,
but the application should ensure the flows inserted for the hairpin
port pairs be handled properly. The traffic behavior should be
divinable after unbound. It is suggested to remove all the flows for
the same direction of a port pairs to be unbound, on both ports.
A function pointer with 'eth_hairpin_unbind' type should be provided
by the PMD to execute the hardware setting in the driver.
0 return value means success and a negative value will be returned to
indicate the actual failure.
After unbinding, the bind API could be called again to enable it. No
peer reconfiguring is supported now without closing the devices.

4. New rte_flow item
* RTE_FLOW_ITEM_TYPE_TX_QUEUE
struct rte_flow_item_tx_queue {
        uint32_t queue;
};
This provides a new item to match for an egress packet. In two ports
hairpin mode, since the TX rules could be inserted explicitly on the
egress port, it is hard to distinguish the hairpin packets from the
software packets. Even if with metadata, it may require complex
management. The support new rte_flow item is optional, depending on
the NIC's capacity. With this item, a few wildcard rules could be
inserted for hairpin to support some common actions.

When switching to two ports hairpin mode with explicit TX rules, the
metadata could be used to provide the 'connection' for a packet
between ingress & egress.
1. The packet header might be changed due to the NAT of DECAP in the
   ingress, and the inner header or other parts may be different.
2. Different ingress flow rules could share the same egress rule to
   simplify rules management.
The rte_flow examples are like below (port 0 RX X -> port 1 TX Y):

flow create 0 ingress group M pattern eth / … / end actions queue index is X / 
set_meta data is V / end
X is the ingress hairpin queue index.

flow create 1 egress group N pattern eth / meta data is V / end actions 
vxlan_encap / end

flow create 1 egress group 0 pattern eth / tx_queue index is Y / end actions 
jump group N / end
Y is the egress hairpin queue index. This wildcard flow will help to
redirect all the ethernet packets from hairpin TX queue Y to some
specific group for further handling. In the meanwhile, other traffic
sent from software will not be impacted by this wildcard rule.

To verify this in testpmd, some changes are also required.
1. During startup phase, hairpin binding will use the chaining mode.
   E.g. if 3 ports are probed, hairpin traffic will be like this
   port A -> port B, Port B -> port C, port C -> port A
   In only a single port is probed
   port A -> port A
2. flow command line will add support to parse tx queue index
   pattern format: tx_queue index is UNSIGNED / ...

Thanks

Signed-off-by: Bing Zhao <bi...@nvidia.com>

Bing Zhao (4):
  ethdev: add support for flow item transmit queue
  testpmd: add item transmit queue in flow CLI
  ethdev: add hairpin bind APIs
  ethdev: add new attributes to hairpin queues config

 app/test-pmd/cmdline_flow.c           |  18 ++++++
 lib/librte_ethdev/rte_ethdev.c        | 100 ++++++++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h        |  68 +++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev_driver.h |  52 ++++++++++++++++++
 lib/librte_ethdev/rte_flow.c          |   1 +
 lib/librte_ethdev/rte_flow.h          |  30 ++++++++++
 6 files changed, 269 insertions(+)

-- 
2.5.5

Reply via email to