On Mon, Jun 29, 2026 at 12:11 AM Michael Gur <[email protected]> wrote:
>
> >
>
> On 6/22/2026 9:41 PM, Zhiping Zhang wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > Peer-to-peer DMA between a mlx5 NIC and a foreign PCIe endpoint
> > (typically a GPU or a vfio-pci passthrough device) traverses the host
> > PCIe fabric. The endpoint exporting the dma-buf knows which PCIe TLP
> > Processing Hint (TPH) Steering Tag yields the best placement for the
> > traffic it will sink: per-endpoint hint selection lets the root complex
> > or switch direct DMA to a specific cache slice / NUMA node, cutting
> > cross-socket snoop traffic and DRAM pressure under sustained p2p
> > workloads.
> >
> > Until now the mlx5 importer had no way to learn the exporter's chosen
> > ST tag, so dma-buf MRs were registered without TPH and ran with the
> > default (no-hint) routing. With dma_buf_get_pci_tph() in place this
> > patch wires up mlx5_ib to query that metadata at MR registration time
> > for p2p access and use it to program requester-side TPH on the outbound
> > mkey. If the exporter has no metadata, fall back to the existing
> > no-TPH path so behavior for non-TPH-aware exporters is unchanged.
> >
> > Use mlx5_st_alloc_index_by_tag() to translate exporter-provided
> > steering tags into local ST entries when table mode is active, and add
> > mlx5_st_get_index() for DMAH-backed flows that already carry an ST
> > index.
> >
> > For TPH-backed FRMRs, keep the extra ST-table reference tied to MR
> > lifetime rather than pooled mkey lifetime. Acquire the ref before MR
> > creation and release it again when the MR is returned to the pool or
> > the backing mkey is destroyed, while leaving the generic FRMR pool
> > core unchanged.
> >
> > Import the DMA_BUF namespace for the new dma_buf_get_pci_tph() call so
> > modular mlx5_ib builds link cleanly.
> >
> > Signed-off-by: Zhiping Zhang <[email protected]>
> > ---
> >   drivers/infiniband/hw/mlx5/main.c             |   1 +
> >   drivers/infiniband/hw/mlx5/mr.c               | 103 +++++++++++++++++-
> >   .../net/ethernet/mellanox/mlx5/core/lib/st.c  |  49 +++++++--
> >   include/linux/mlx5/driver.h                   |  13 ++
> >   4 files changed, 157 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/main.c 
> > b/drivers/infiniband/hw/mlx5/main.c
> > index 02809114fc79..a2b497f6b16b 100644
> > --- a/drivers/infiniband/hw/mlx5/main.c
> > +++ b/drivers/infiniband/hw/mlx5/main.c
> > @@ -60,6 +60,7 @@
> >   MODULE_AUTHOR("Eli Cohen <[email protected]>");
> >   MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX 
> > series) IB driver");
> >   MODULE_LICENSE("Dual BSD/GPL");
> > +MODULE_IMPORT_NS("DMA_BUF");
> >
> >   struct mlx5_ib_event_work {
> >          struct work_struct      work;
> > diff --git a/drivers/infiniband/hw/mlx5/mr.c 
> > b/drivers/infiniband/hw/mlx5/mr.c
> > index e6b74955d95d..7aced3f55456 100644
> > --- a/drivers/infiniband/hw/mlx5/mr.c
> > +++ b/drivers/infiniband/hw/mlx5/mr.c
> > @@ -39,6 +39,7 @@
> >   #include <linux/delay.h>
> >   #include <linux/dma-buf.h>
> >   #include <linux/dma-resv.h>
> > +#include <linux/pci-tph.h>
> >   #include <rdma/frmr_pools.h>
> >   #include <rdma/ib_umem_odp.h>
> >   #include "dm.h"
> > @@ -167,6 +168,32 @@ static int get_unchangeable_access_flags(struct 
> > mlx5_ib_dev *dev,
> >   #define MLX5_FRMR_POOLS_KERNEL_KEY_PH_MASK GENMASK_ULL(23, 16)
> >   #define MLX5_FRMR_POOLS_KERNEL_KEY_ST_INDEX_MASK GENMASK_ULL(15, 0)
> >
> > +static int mlx5_ib_get_frmr_st_handle_ref(struct mlx5_ib_dev *dev,
> > +                                         u16 st_index)
> > +{
> > +       if (st_index == MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX)
> > +               return 0;
> > +
> > +       return mlx5_st_get_index(dev->mdev, st_index);
> > +}
> > +
> > +static void mlx5_ib_put_st_index_ref(struct mlx5_ib_dev *dev, u16 st_index)
> > +{
> > +       if (st_index == MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX)
> > +               return;
> > +
> > +       mlx5_st_dealloc_index(dev->mdev, st_index);
> > +}
> > +
> > +static void mlx5_ib_put_frmr_st_handle_ref(struct mlx5_ib_dev *dev,
> > +                                          u64 kernel_vendor_key)
> > +{
> > +       u16 st_index = FIELD_GET(MLX5_FRMR_POOLS_KERNEL_KEY_ST_INDEX_MASK,
> > +                                kernel_vendor_key);
> > +
> > +       mlx5_ib_put_st_index_ref(dev, st_index);
> > +}
> > +
>
> Please remove the _frmr_ from the functions naming.
> This is now unrelated to the frmr and is strictly tight to MRs.
>

Agreed. Will rename for v10:
    mlx5_ib_get_frmr_st_handle_ref() -> mlx5_ib_get_st_handle_ref()
    mlx5_ib_put_frmr_st_handle_ref() -> mlx5_ib_put_st_handle_ref()

> ....
>
> > @@ -335,6 +364,7 @@ static int mlx5r_build_frmr_key(struct ib_device 
> > *device,
> >                  get_unchangeable_access_flags(dev, in->access_flags);
> >          out->vendor_key = in->vendor_key;
> >          out->num_dma_blocks = in->num_dma_blocks;
> > +       out->kernel_vendor_key = in->kernel_vendor_key;
>
> This path is used to translate an frmr key passed from user-space to the
> right values, enforced and masked by the drivers.
> kernel_vendor_key is not allowed in this path as user-space is not
> allowed to control those.
> Please drop this line.
>
> Thanks,
> Michael
>

Good catch, will drop.

Thanks,
Zhiping

Reply via email to