mlx5: support SubFunction

Xueming Li Thu, 27 May 2021 07:03:18 -0700

This patch introduces SF support. Similar to VF, SF on auxiliary bus is
a portion of hardware PF, no representor or bonding parameters for SF.


Devargs to support SF:
-a auxiliary:mlx5_core.sf.8,dv_flow_en=1

New global syntax to support SF:
-a bus=auxiliary,name=mlx5_core.sf.8/class=eth/driver=mlx5,dv_flow_en=1

Signed-off-by: Xueming Li <[email protected]>
---
 doc/guides/nics/mlx5.rst                | 339 +++++++++++++++++++++++-
 drivers/net/mlx5/linux/mlx5_ethdev_os.c |  12 +-
 drivers/net/mlx5/linux/mlx5_os.c        | 142 +++++++---
 drivers/net/mlx5/linux/mlx5_os.h        |   2 +
 drivers/net/mlx5/mlx5.c                 |  10 +-
 drivers/net/mlx5/mlx5.h                 |   1 +
 drivers/net/mlx5/mlx5_rxmode.c          |   8 +-
 drivers/net/mlx5/mlx5_trigger.c         |   2 +-
 8 files changed, 452 insertions(+), 64 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 83299646dd..3f5692038c 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -403,6 +403,300 @@ Limitations
   - Hairpin between two ports could only manual binding and explicit Tx flow 
mode. For single port hairpin, all the combinations of auto/manual binding and 
explicit/implicit Tx flow mode could be supported.
   - Hairpin in switchdev SR-IOV mode is not supported till now.
 
+- Meter:
+
+Limitations
+-----------
+
+- Windows support:
+
+  On Windows, the features are limited:
+
+  - Promiscuous mode is not supported
+  - The following rules are supported:
+
+    - IPv4/UDP with CVLAN filtering
+    - Unicast MAC filtering
+
+- For secondary process:
+
+  - Forked secondary process not supported.
+  - External memory unregistered in EAL memseg list cannot be used for DMA
+    unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
+    primary process and remapped to the same virtual address in secondary
+    process. If the external memory is registered by primary process but has
+    different virtual address in secondary process, unexpected error may 
happen.
+
+- When using Verbs flow engine (``dv_flow_en`` = 0), flow pattern without any
+  specific VLAN will match for VLAN packets as well:
+
+  When VLAN spec is not specified in the pattern, the matching rule will be 
created with VLAN as a wild card.
+  Meaning, the flow rule::
+
+        flow create 0 ingress pattern eth / vlan vid is 3 / ipv4 / end ...
+
+  Will only match vlan packets with vid=3. and the flow rule::
+
+        flow create 0 ingress pattern eth / ipv4 / end ...
+
+  Will match any ipv4 packet (VLAN included).
+
+- When using Verbs flow engine (``dv_flow_en`` = 0), multi-tagged(QinQ) match 
is not supported.
+
+- When using DV flow engine (``dv_flow_en`` = 1), flow pattern with any VLAN 
specification will match only single-tagged packets unless the ETH item 
``type`` field is 0x88A8 or the VLAN item ``has_more_vlan`` field is 1.
+  The flow rule::
+
+        flow create 0 ingress pattern eth / ipv4 / end ...
+
+  Will match any ipv4 packet.
+  The flow rules::
+
+        flow create 0 ingress pattern eth / vlan / end ...
+        flow create 0 ingress pattern eth has_vlan is 1 / end ...
+        flow create 0 ingress pattern eth type is 0x8100 / end ...
+
+  Will match single-tagged packets only, with any VLAN ID value.
+  The flow rules::
+
+        flow create 0 ingress pattern eth type is 0x88A8 / end ...
+        flow create 0 ingress pattern eth / vlan has_more_vlan is 1 / end ...
+
+  Will match multi-tagged packets only, with any VLAN ID value.
+
+- A flow pattern with 2 sequential VLAN items is not supported.
+
+- VLAN pop offload command:
+
+  - Flow rules having a VLAN pop offload command as one of their actions and
+    are lacking a match on VLAN as one of their items are not supported.
+  - The command is not supported on egress traffic in NIC mode.
+
+- VLAN push offload is not supported on ingress traffic in NIC mode.
+
+- VLAN set PCP offload is not supported on existing headers.
+
+- A multi segment packet must have not more segments than reported by 
dev_infos_get()
+  in tx_desc_lim.nb_seg_max field. This value depends on maximal supported Tx 
descriptor
+  size and ``txq_inline_min`` settings and may be from 2 (worst case forced by 
maximal
+  inline settings) to 58.
+
+- Flows with a VXLAN Network Identifier equal (or ends to be equal)
+  to 0 are not supported.
+
+- L3 VXLAN and VXLAN-GPE tunnels cannot be supported together with MPLSoGRE 
and MPLSoUDP.
+
+- Match on Geneve header supports the following fields only:
+
+     - VNI
+     - OAM
+     - protocol type
+     - options length
+
+- Match on Geneve TLV option is supported on the following fields:
+
+     - Class
+     - Type
+     - Length
+     - Data
+
+  Only one Class/Type/Length Geneve TLV option is supported per shared device.
+  Class/Type/Length fields must be specified as well as masks.
+  Class/Type/Length specified masks must be full.
+  Matching Geneve TLV option without specifying data is not supported.
+  Matching Geneve TLV option with ``data & mask == 0`` is not supported.
+
+- VF: flow rules created on VF devices can only match traffic targeted at the
+  configured MAC addresses (see ``rte_eth_dev_mac_addr_add()``).
+
+- Match on GTP tunnel header item supports the following fields only:
+
+     - v_pt_rsv_flags: E flag, S flag, PN flag
+     - msg_type
+     - teid
+
+- Match on GTP extension header only for GTP PDU session container (next
+  extension header type = 0x85).
+- Match on GTP extension header is not supported in group 0.
+
+- No Tx metadata go to the E-Switch steering domain for the Flow group 0.
+  The flows within group 0 and set metadata action are rejected by hardware.
+
+.. note::
+
+   MAC addresses not already present in the bridge table of the associated
+   kernel network device will be added and cleaned up by the PMD when closing
+   the device. In case of ungraceful program termination, some entries may
+   remain present and should be removed manually by other means.
+
+- Buffer split offload is supported with regular Rx burst routine only,
+  no MPRQ feature or vectorized code can be engaged.
+
+- When Multi-Packet Rx queue is configured (``mprq_en``), a Rx packet can be
+  externally attached to a user-provided mbuf with having EXT_ATTACHED_MBUF in
+  ol_flags. As the mempool for the external buffer is managed by PMD, all the
+  Rx mbufs must be freed before the device is closed. Otherwise, the mempool of
+  the external buffers will be freed by PMD and the application which still
+  holds the external buffers may be corrupted.
+
+- If Multi-Packet Rx queue is configured (``mprq_en``) and Rx CQE compression 
is
+  enabled (``rxq_cqe_comp_en``) at the same time, RSS hash result is not fully
+  supported. Some Rx packets may not have PKT_RX_RSS_HASH.
+
+- IPv6 Multicast messages are not supported on VM, while promiscuous mode
+  and allmulticast mode are both set to off.
+  To receive IPv6 Multicast messages on VM, explicitly set the relevant
+  MAC address using rte_eth_dev_mac_addr_add() API.
+
+- To support a mixed traffic pattern (some buffers from local host memory, some
+  buffers from other devices) with high bandwidth, a mbuf flag is used.
+
+  An application hints the PMD whether or not it should try to inline the
+  given mbuf data buffer. PMD should do the best effort to act upon this 
request.
+
+  The hint flag ``RTE_PMD_MLX5_FINE_GRANULARITY_INLINE`` is dynamic,
+  registered by application with rte_mbuf_dynflag_register(). This flag is
+  purely driver-specific and declared in PMD specific header 
``rte_pmd_mlx5.h``,
+  which is intended to be used by the application.
+
+  To query the supported specific flags in runtime,
+  the function ``rte_pmd_mlx5_get_dyn_flag_names`` returns the array of
+  currently (over present hardware and configuration) supported specific flags.
+  The "not inline hint" feature operating flow is the following one:
+
+    - application starts
+    - probe the devices, ports are created
+    - query the port capabilities
+    - if port supporting the feature is found
+    - register dynamic flag ``RTE_PMD_MLX5_FINE_GRANULARITY_INLINE``
+    - application starts the ports
+    - on ``dev_start()`` PMD checks whether the feature flag is registered and
+      enables the feature support in datapath
+    - application might set the registered flag bit in ``ol_flags`` field
+      of mbuf being sent and PMD will handle ones appropriately.
+
+- The amount of descriptors in Tx queue may be limited by data inline settings.
+  Inline data require the more descriptor building blocks and overall block
+  amount may exceed the hardware supported limits. The application should
+  reduce the requested Tx size or adjust data inline settings with
+  ``txq_inline_max`` and ``txq_inline_mpw`` devargs keys.
+
+- To provide the packet send scheduling on mbuf timestamps the ``tx_pp``
+  parameter should be specified.
+  When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME set on the packet
+  being sent it tries to synchronize the time of packet appearing on
+  the wire with the specified packet timestamp. It the specified one
+  is in the past it should be ignored, if one is in the distant future
+  it should be capped with some reasonable value (in range of seconds).
+  These specific cases ("too late" and "distant future") can be optionally
+  reported via device xstats to assist applications to detect the
+  time-related problems.
+
+  The timestamp upper "too-distant-future" limit
+  at the moment of invoking the Tx burst routine
+  can be estimated as ``tx_pp`` option (in nanoseconds) multiplied by 2^23.
+  Please note, for the testpmd txonly mode,
+  the limit is deduced from the expression::
+
+        (n_tx_descriptors / burst_size + 1) * inter_burst_gap
+
+  There is no any packet reordering according timestamps is supposed,
+  neither within packet burst, nor between packets, it is an entirely
+  application responsibility to generate packets and its timestamps
+  in desired order. The timestamps can be put only in the first packet
+  in the burst providing the entire burst scheduling.
+
+- E-Switch decapsulation Flow:
+
+  - can be applied to PF port only.
+  - must specify VF port action (packet redirection from PF to VF).
+  - optionally may specify tunnel inner source and destination MAC addresses.
+
+- E-Switch  encapsulation Flow:
+
+  - can be applied to VF ports only.
+  - must specify PF port action (packet redirection from VF to PF).
+
+- Raw encapsulation:
+
+  - The input buffer, used as outer header, is not validated.
+
+- Raw decapsulation:
+
+  - The decapsulation is always done up to the outermost tunnel detected by 
the HW.
+  - The input buffer, providing the removal size, is not validated.
+  - The buffer size must match the length of the headers to be removed.
+
+- ICMP(code/type/identifier/sequence number) / ICMP6(code/type) matching, 
IP-in-IP and MPLS flow matching are all
+  mutually exclusive features which cannot be supported together
+  (see :ref:`mlx5_firmware_config`).
+
+- LRO:
+
+  - Requires DevX and DV flow to be enabled.
+  - KEEP_CRC offload cannot be supported with LRO.
+  - The first mbuf length, without head-room,  must be big enough to include 
the
+    TCP header (122B).
+  - Rx queue with LRO offload enabled, receiving a non-LRO packet, can forward
+    it with size limited to max LRO size, not to max RX packet length.
+  - LRO can be used with outer header of TCP packets of the standard format:
+        eth (with or without vlan) / ipv4 or ipv6 / tcp / payload
+
+    Other TCP packets (e.g. with MPLS label) received on Rx queue with LRO 
enabled, will be received with bad checksum.
+  - LRO packet aggregation is performed by HW only for packet size larger than
+    ``lro_min_mss_size``. This value is reported on device start, when debug
+    mode is enabled.
+
+- CRC:
+
+  - ``DEV_RX_OFFLOAD_KEEP_CRC`` cannot be supported with decapsulation
+    for some NICs (such as ConnectX-6 Dx, ConnectX-6 Lx, and BlueField-2).
+    The capability bit ``scatter_fcs_w_decap_disable`` shows NIC support.
+
+- TX mbuf fast free:
+
+  - fast free offload assumes the all mbufs being sent are originated from the
+    same memory pool and there is no any extra references to the mbufs (the
+    reference counter for each mbuf is equal 1 on tx_burst call). The latter
+    means there should be no any externally attached buffers in mbufs. It is
+    an application responsibility to provide the correct mbufs if the fast
+    free offload is engaged. The mlx5 PMD implicitly produces the mbufs with
+    externally attached buffers if MPRQ option is enabled, hence, the fast
+    free offload is neither supported nor advertised if there is MPRQ enabled.
+
+- Sample flow:
+
+  - Supports ``RTE_FLOW_ACTION_TYPE_SAMPLE`` action only within NIC Rx and
+    E-Switch steering domain.
+  - For E-Switch Sampling flow with sample ratio > 1, additional actions are 
not
+    supported in the sample actions list.
+  - For ConnectX-5, the ``RTE_FLOW_ACTION_TYPE_SAMPLE`` is typically used as
+    first action in the E-Switch egress flow if with header modify or
+    encapsulation actions.
+  - For NIC Rx flow, supports ``MARK``, ``COUNT``, ``QUEUE``, ``RSS`` in the
+    sample actions list.
+  - For E-Switch mirroring flow, supports ``RAW ENCAP``, ``Port ID``,
+    ``VXLAN ENCAP``, ``NVGRE ENCAP`` in the sample actions list.
+
+- Modify Field flow:
+
+  - Supports the 'set' operation only for 
``RTE_FLOW_ACTION_TYPE_MODIFY_FIELD`` action.
+  - Modification of an arbitrary place in a packet via the special 
``RTE_FLOW_FIELD_START`` Field ID is not supported.
+  - Modification of the 802.1Q Tag, VXLAN Network or GENEVE Network ID's is 
not supported.
+  - Encapsulation levels are not supported, can modify outermost header fields 
only.
+  - Offsets must be 32-bits aligned, cannot skip past the boundary of a field.
+
+- IPv6 header item 'proto' field, indicating the next header protocol, should
+  not be set as extension header.
+  In case the next header is an extension header, it should not be specified in
+  IPv6 header item 'proto' field.
+  The last extension header item 'next header' field can specify the following
+  header protocol type.
+
+- Hairpin:
+
+  - Hairpin between two ports could only manual binding and explicit Tx flow 
mode. For single port hairpin, all the combinations of auto/manual binding and 
explicit/implicit Tx flow mode could be supported.
+  - Hairpin in switchdev SR-IOV mode is not supported till now.
+
 - Meter:
 
   - All the meter colors with drop action will be counted only by the global 
drop statistics.
@@ -1438,13 +1732,17 @@ the DPDK application.
 
         echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
 
-Sub-Function representor
-------------------------
+SubFunction support
+-------------------
+SubFunction is a portion of the PCI device, a SF netdev has its own
+dedicated queues(txq, rxq). A SF shares PCI level resources with other SFs
+and/or with its parent PCI function.
 
-Sub-Function is a portion of the PCI device, a SF netdev has its own
-dedicated queues(txq, rxq). A SF netdev supports E-Switch representation
-offload similar to existing PF and VF representors. A SF shares PCI
-level resources with other SFs and/or with its parent PCI function.
+0. Requirement::
+
+        kernel version >= 5.12 or OFED version >= 5.6
+
+        iproute2 >= 5.11
 
 1. Configure SF feature::
 
@@ -1457,21 +1755,34 @@ level resources with other SFs and/or with its parent 
PCI function.
             2: 32 SFs
             3: 64 SFs
 
-2. Reset the FW::
+2. Enable switchdev mode::
 
-        mlxfwreset -d <mst device> reset
+        devlink dev eswitch set pci/<DBDF> mode switchdev
 
-3. Enable switchdev mode::
+3. Add SF port::
 
-        echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
+        devlink port add pci/<DBDF> flavour pcisf pfnum 0 sfnum <sfnum>
+
+        Get SFID from output: pci/<DBDF>/<SFID>
+
+4. Modify MAC address::
+
+        devlink port function set pci/<DBDF>/<SFID> hw_addr <MAC>
+
+5. Activate SF port::
+
+        devlink port function set pci/<DBDF>/<ID> state active
 
-4. Create SF::
+6. Devargs to probe SF device::
 
-        mlnx-sf -d <PCI_BDF> -a create
+        auxiliary:mlx5_core.sf.9,dv_flow_en=1
 
-5. Probe SF representor::
+SubFunction representor support
+-------------------------------
+A SF netdev supports E-Switch representation offload similar to existing PF
+and VF representors. Use <sfnum> to probe SF representor.
 
-        testpmd> port attach <PCI_BDF>,representor=sf0,dv_flow_en=1
+        testpmd> port attach <PCI_BDF>,representor=sf<sfnum>,dv_flow_en=1
 
 Performance tuning
 ------------------
diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c 
b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index 6fdb310129..8678502595 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -128,6 +128,17 @@ struct ethtool_link_settings {
 #define ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT 2 /* 66 - 64 */
 #endif
 
+/* Get interface index from SubFunction device name. */
+int
+mlx5_auxiliary_get_ifindex(const char *sf_name)
+{
+       char if_name[IF_NAMESIZE];
+
+       if (mlx5_auxiliary_get_child_name(sf_name, "/net",
+                                         if_name, sizeof(if_name)) != 0)
+               return -rte_errno;
+       return if_nametoindex(if_name);
+}
 
 /**
  * Get interface name from private structure.
@@ -1619,4 +1630,3 @@ mlx5_get_mac(struct rte_eth_dev *dev, uint8_t 
(*mac)[RTE_ETHER_ADDR_LEN])
        memcpy(mac, request.ifr_hwaddr.sa_data, RTE_ETHER_ADDR_LEN);
        return 0;
 }
-
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 4f16230fa5..d74273a7ca 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -20,6 +20,7 @@
 #include <ethdev_pci.h>
 #include <rte_pci.h>
 #include <rte_bus_pci.h>
+#include <rte_bus_auxiliary.h>
 #include <rte_common.h>
 #include <rte_kvargs.h>
 #include <rte_rwlock.h>
@@ -1923,6 +1924,27 @@ mlx5_device_bond_pci_match(const struct ibv_device 
*ibv_dev,
        return pf;
 }
 
+static void
+mlx5_os_config_default(struct mlx5_dev_config *config)
+{
+       memset(config, 0, sizeof(*config));
+       config->mps = MLX5_ARG_UNSET;
+       config->dbnc = MLX5_ARG_UNSET;
+       config->rx_vec_en = 1;
+       config->txq_inline_max = MLX5_ARG_UNSET;
+       config->txq_inline_min = MLX5_ARG_UNSET;
+       config->txq_inline_mpw = MLX5_ARG_UNSET;
+       config->txqs_inline = MLX5_ARG_UNSET;
+       config->vf_nl_en = 1;
+       config->mr_ext_memseg_en = 1;
+       config->mprq.max_memcpy_len = MLX5_MPRQ_MEMCPY_DEFAULT_LEN;
+       config->mprq.min_rxqs_num = MLX5_MPRQ_MIN_RXQS;
+       config->dv_esw_en = 1;
+       config->dv_flow_en = 1;
+       config->decap_en = 1;
+       config->log_hp_size = MLX5_ARG_UNSET;
+}
+
 /**
  * Register a PCI device within bonding.
  *
@@ -2334,23 +2356,8 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
                uint32_t restore;
 
                /* Default configuration. */
-               memset(&dev_config, 0, sizeof(struct mlx5_dev_config));
+               mlx5_os_config_default(&dev_config);
                dev_config.vf = dev_config_vf;
-               dev_config.mps = MLX5_ARG_UNSET;
-               dev_config.dbnc = MLX5_ARG_UNSET;
-               dev_config.rx_vec_en = 1;
-               dev_config.txq_inline_max = MLX5_ARG_UNSET;
-               dev_config.txq_inline_min = MLX5_ARG_UNSET;
-               dev_config.txq_inline_mpw = MLX5_ARG_UNSET;
-               dev_config.txqs_inline = MLX5_ARG_UNSET;
-               dev_config.vf_nl_en = 1;
-               dev_config.mr_ext_memseg_en = 1;
-               dev_config.mprq.max_memcpy_len = MLX5_MPRQ_MEMCPY_DEFAULT_LEN;
-               dev_config.mprq.min_rxqs_num = MLX5_MPRQ_MIN_RXQS;
-               dev_config.dv_esw_en = 1;
-               dev_config.dv_flow_en = 1;
-               dev_config.decap_en = 1;
-               dev_config.log_hp_size = MLX5_ARG_UNSET;
                list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
                                                 &list[i],
                                                 &dev_config,
@@ -2407,6 +2414,35 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
        return ret;
 }
 
+static int
+mlx5_os_parse_eth_devargs(struct rte_device *dev,
+                         struct rte_eth_devargs *eth_da)
+{
+       int ret = 0;
+
+       if (dev->devargs == NULL)
+               return 0;
+       memset(eth_da, 0, sizeof(*eth_da));
+       /* Parse representor information first from class argument. */
+       if (dev->devargs->cls_str)
+               ret = rte_eth_devargs_parse(dev->devargs->cls_str, eth_da);
+       if (ret != 0) {
+               DRV_LOG(ERR, "failed to parse device arguments: %s",
+                       dev->devargs->cls_str);
+               return -rte_errno;
+       }
+       if (eth_da->type == RTE_ETH_REPRESENTOR_NONE) {
+               /* Parse legacy device argument */
+               ret = rte_eth_devargs_parse(dev->devargs->args, eth_da);
+               if (ret) {
+                       DRV_LOG(ERR, "failed to parse device arguments: %s",
+                               dev->devargs->args);
+                       return -rte_errno;
+               }
+       }
+       return 0;
+}
+
 /**
  * Callback to register a PCI device.
  *
@@ -2421,31 +2457,13 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 static int
 mlx5_os_pci_probe(struct rte_pci_device *pci_dev)
 {
-       struct rte_eth_devargs eth_da = { .type = RTE_ETH_REPRESENTOR_NONE };
+       struct rte_eth_devargs eth_da = { .nb_ports = 0 };
        int ret = 0;
        uint16_t p;
 
-       if (pci_dev->device.devargs) {
-               /* Parse representor information from device argument. */
-               if (pci_dev->device.devargs->cls_str)
-                       ret = rte_eth_devargs_parse
-                               (pci_dev->device.devargs->cls_str, &eth_da);
-               if (ret) {
-                       DRV_LOG(ERR, "failed to parse device arguments: %s",
-                               pci_dev->device.devargs->cls_str);
-                       return -rte_errno;
-               }
-               if (eth_da.type == RTE_ETH_REPRESENTOR_NONE) {
-                       /* Support legacy device argument */
-                       ret = rte_eth_devargs_parse
-                               (pci_dev->device.devargs->args, &eth_da);
-                       if (ret) {
-                               DRV_LOG(ERR, "failed to parse device arguments: 
%s",
-                                       pci_dev->device.devargs->args);
-                               return -rte_errno;
-                       }
-               }
-       }
+       ret = mlx5_os_parse_eth_devargs(&pci_dev->device, &eth_da);
+       if (ret != 0)
+               return ret;
 
        if (eth_da.nb_ports > 0) {
                /* Iterate all port if devargs pf is range: "pf[0-1]vf[...]". */
@@ -2458,10 +2476,53 @@ mlx5_os_pci_probe(struct rte_pci_device *pci_dev)
        return ret;
 }
 
+/* Probe a single SF device on auxiliary bus, no representor support. */
+static int
+mlx5_os_auxiliary_probe(struct rte_device *dev)
+{
+       struct rte_eth_devargs eth_da = { .nb_ports = 0 };
+       struct mlx5_dev_config config;
+       struct mlx5_dev_spawn_data spawn = { .pf_bond = -1 };
+       struct rte_auxiliary_device *adev = RTE_DEV_TO_AUXILIARY(dev);
+       struct rte_eth_dev *eth_dev;
+       int ret = 0;
+
+       /* Parse ethdev devargs. */
+       ret = mlx5_os_parse_eth_devargs(dev, &eth_da);
+       if (ret != 0)
+               return ret;
+       /* Set default config data. */
+       mlx5_os_config_default(&config);
+       config.sf = 1;
+       /* Init spawn data. */
+       spawn.max_port = 1;
+       spawn.phys_port = 1;
+       spawn.phys_dev = mlx5_get_ibv_device(dev);
+       ret = mlx5_auxiliary_get_ifindex(dev->name);
+       if (ret < 0) {
+               DRV_LOG(ERR, "failed to get ethdev ifindex: %s", dev->name);
+               return ret;
+       }
+       spawn.ifindex = ret;
+       /* Spawn device. */
+       eth_dev = mlx5_dev_spawn(dev, &spawn, &config, &eth_da);
+       if (eth_dev == NULL)
+               return -rte_errno;
+       /* Post create. */
+       eth_dev->intr_handle = &adev->intr_handle;
+       if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+               eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
+               eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_RMV;
+               eth_dev->data->numa_node = dev->numa_node;
+       }
+       rte_eth_dev_probing_finish(eth_dev);
+       return 0;
+}
+
 /**
  * Common bus driver callback to probe a device.
  *
- * This function probe PCI bus device(s).
+ * This function probe PCI bus device(s) or a single SF on auxiliary bus.
  *
  * @param[in] dev
  *   Pointer to the generic device.
@@ -2484,7 +2545,8 @@ mlx5_os_net_probe(struct rte_device *dev)
        }
        if (mlx5_dev_is_pci(dev))
                return mlx5_os_pci_probe(RTE_DEV_TO_PCI(dev));
-       return 0;
+       else
+               return mlx5_os_auxiliary_probe(dev);
 }
 
 static int
diff --git a/drivers/net/mlx5/linux/mlx5_os.h b/drivers/net/mlx5/linux/mlx5_os.h
index af7cbeb418..2991d37df2 100644
--- a/drivers/net/mlx5/linux/mlx5_os.h
+++ b/drivers/net/mlx5/linux/mlx5_os.h
@@ -19,4 +19,6 @@ enum {
 
 #define MLX5_NAMESIZE IF_NAMESIZE
 
+int mlx5_auxiliary_get_ifindex(const char *sf_name);
+
 #endif /* RTE_PMD_MLX5_OS_H_ */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 3defdb2db3..69edd55b86 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2319,10 +2319,12 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_eth_dev 
*odev)
                        if (opriv->sh == priv->sh ||
                            odev->device == dev->device)
                                break;
-               } else if (dev->device != NULL && dev->device->driver &&
-                       dev->device->driver->name &&
-                       !strcmp(dev->device->driver->name,
-                               MLX5_PCI_DRIVER_NAME)) {
+               } else if (dev->device != NULL && dev->device->driver != NULL &&
+                       dev->device->driver->name != NULL &&
+                       (strcmp(dev->device->driver->name,
+                               MLX5_PCI_DRIVER_NAME) == 0 ||
+                        strcmp(dev->device->driver->name,
+                               MLX5_AUXILIARY_DRIVER_NAME) == 0)) {
                        /* odev not specified, found all mlx5 devices. */
                        break;
                }
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 27bb34e827..b06f45fc54 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -220,6 +220,7 @@ struct mlx5_dev_config {
        unsigned int hw_fcs_strip:1; /* FCS stripping is supported. */
        unsigned int hw_padding:1; /* End alignment padding is supported. */
        unsigned int vf:1; /* This is a VF. */
+       unsigned int sf:1; /* This is a SF. */
        unsigned int tunnel_en:1;
        /* Whether tunnel stateless offloads are supported. */
        unsigned int mpls_en:1; /* MPLS over GRE/UDP is enabled. */
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 25fb47c9ed..7f19b235c2 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -36,7 +36,7 @@ mlx5_promiscuous_enable(struct rte_eth_dev *dev)
                        dev->data->port_id);
                return 0;
        }
-       if (priv->config.vf) {
+       if (priv->config.vf || priv->config.sf) {
                ret = mlx5_os_set_promisc(dev, 1);
                if (ret)
                        return ret;
@@ -69,7 +69,7 @@ mlx5_promiscuous_disable(struct rte_eth_dev *dev)
        int ret;
 
        dev->data->promiscuous = 0;
-       if (priv->config.vf) {
+       if (priv->config.vf || priv->config.sf) {
                ret = mlx5_os_set_promisc(dev, 0);
                if (ret)
                        return ret;
@@ -109,7 +109,7 @@ mlx5_allmulticast_enable(struct rte_eth_dev *dev)
                        dev->data->port_id);
                return 0;
        }
-       if (priv->config.vf) {
+       if (priv->config.vf || priv->config.sf) {
                ret = mlx5_os_set_allmulti(dev, 1);
                if (ret)
                        goto error;
@@ -142,7 +142,7 @@ mlx5_allmulticast_disable(struct rte_eth_dev *dev)
        int ret;
 
        dev->data->all_multicast = 0;
-       if (priv->config.vf) {
+       if (priv->config.vf || priv->config.sf) {
                ret = mlx5_os_set_allmulti(dev, 0);
                if (ret)
                        goto error;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 6c8a64ce03..e4e057a6f8 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1259,7 +1259,7 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
                }
                mlx5_txq_release(dev, i);
        }
-       if (priv->config.dv_esw_en && !priv->config.vf) {
+       if (priv->config.dv_esw_en && !priv->config.vf && !priv->config.sf) {
                if (mlx5_flow_create_esw_table_zero_flow(dev))
                        priv->fdb_def_rule = 1;
                else
-- 
2.25.1

[dpdk-dev] [RFC 14/14] net/mlx5: support SubFunction

Reply via email to