On 6/22/2026 9:41 PM, Zhiping Zhang wrote:
External email: Use caution opening links or attachments


Peer-to-peer DMA between a mlx5 NIC and a foreign PCIe endpoint
(typically a GPU or a vfio-pci passthrough device) traverses the host
PCIe fabric. The endpoint exporting the dma-buf knows which PCIe TLP
Processing Hint (TPH) Steering Tag yields the best placement for the
traffic it will sink: per-endpoint hint selection lets the root complex
or switch direct DMA to a specific cache slice / NUMA node, cutting
cross-socket snoop traffic and DRAM pressure under sustained p2p
workloads.

Until now the mlx5 importer had no way to learn the exporter's chosen
ST tag, so dma-buf MRs were registered without TPH and ran with the
default (no-hint) routing. With dma_buf_get_pci_tph() in place this
patch wires up mlx5_ib to query that metadata at MR registration time
for p2p access and use it to program requester-side TPH on the outbound
mkey. If the exporter has no metadata, fall back to the existing
no-TPH path so behavior for non-TPH-aware exporters is unchanged.

Use mlx5_st_alloc_index_by_tag() to translate exporter-provided
steering tags into local ST entries when table mode is active, and add
mlx5_st_get_index() for DMAH-backed flows that already carry an ST
index.

For TPH-backed FRMRs, keep the extra ST-table reference tied to MR
lifetime rather than pooled mkey lifetime. Acquire the ref before MR
creation and release it again when the MR is returned to the pool or
the backing mkey is destroyed, while leaving the generic FRMR pool
core unchanged.

Import the DMA_BUF namespace for the new dma_buf_get_pci_tph() call so
modular mlx5_ib builds link cleanly.

Signed-off-by: Zhiping Zhang <[email protected]>
---
  drivers/infiniband/hw/mlx5/main.c             |   1 +
  drivers/infiniband/hw/mlx5/mr.c               | 103 +++++++++++++++++-
  .../net/ethernet/mellanox/mlx5/core/lib/st.c  |  49 +++++++--
  include/linux/mlx5/driver.h                   |  13 ++
  4 files changed, 157 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 02809114fc79..a2b497f6b16b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -60,6 +60,7 @@
  MODULE_AUTHOR("Eli Cohen <[email protected]>");
  MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) IB 
driver");
  MODULE_LICENSE("Dual BSD/GPL");
+MODULE_IMPORT_NS("DMA_BUF");

  struct mlx5_ib_event_work {
         struct work_struct      work;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index e6b74955d95d..7aced3f55456 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -39,6 +39,7 @@
  #include <linux/delay.h>
  #include <linux/dma-buf.h>
  #include <linux/dma-resv.h>
+#include <linux/pci-tph.h>
  #include <rdma/frmr_pools.h>
  #include <rdma/ib_umem_odp.h>
  #include "dm.h"
@@ -167,6 +168,32 @@ static int get_unchangeable_access_flags(struct 
mlx5_ib_dev *dev,
  #define MLX5_FRMR_POOLS_KERNEL_KEY_PH_MASK GENMASK_ULL(23, 16)
  #define MLX5_FRMR_POOLS_KERNEL_KEY_ST_INDEX_MASK GENMASK_ULL(15, 0)

+static int mlx5_ib_get_frmr_st_handle_ref(struct mlx5_ib_dev *dev,
+                                         u16 st_index)
+{
+       if (st_index == MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX)
+               return 0;
+
+       return mlx5_st_get_index(dev->mdev, st_index);
+}
+
+static void mlx5_ib_put_st_index_ref(struct mlx5_ib_dev *dev, u16 st_index)
+{
+       if (st_index == MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX)
+               return;
+
+       mlx5_st_dealloc_index(dev->mdev, st_index);
+}
+
+static void mlx5_ib_put_frmr_st_handle_ref(struct mlx5_ib_dev *dev,
+                                          u64 kernel_vendor_key)
+{
+       u16 st_index = FIELD_GET(MLX5_FRMR_POOLS_KERNEL_KEY_ST_INDEX_MASK,
+                                kernel_vendor_key);
+
+       mlx5_ib_put_st_index_ref(dev, st_index);
+}
+

Please remove the _frmr_ from the functions naming.
This is now unrelated to the frmr and is strictly tight to MRs.

....

@@ -335,6 +364,7 @@ static int mlx5r_build_frmr_key(struct ib_device *device,
                 get_unchangeable_access_flags(dev, in->access_flags);
         out->vendor_key = in->vendor_key;
         out->num_dma_blocks = in->num_dma_blocks;
+       out->kernel_vendor_key = in->kernel_vendor_key;

This path is used to translate an frmr key passed from user-space to the right values, enforced and masked by the drivers. kernel_vendor_key is not allowed in this path as user-space is not allowed to control those.
Please drop this line.

Thanks,
Michael


Reply via email to