This series adds TLP Processing Hints (TPH) support to the VFIO dma-buf
export path, allowing importing drivers (e.g. mlx5) to use the
exporter's steering tag when performing peer-to-peer DMA into a
VFIO-owned device.

There is no separate in-tree vendor kernel driver for the target device:
vfio-pci is the in-tree driver and the targeted device is managed
from userspace via VFIO passthrough. That is why the ST has to flow
through a uAPI: userspace owns the device and its ST table, so it is the
entity that can publish a meaningful value for a given dma-buf. The
kernel-visible participants are still in-tree: vfio-pci exports the
dma-buf and mlx5 imports it.

On the effect: the endpoint's PCIe ingress block uses the 8-bit ST as
an in-band instruction for the incoming P2P TLP -- selecting a target
cache partition and, on writes, an in-flight operation on the data
before it lands. The dma-buf callback keeps this opaque to the
framework -- only the producer (userspace owner of the VFIO device)
and the consumer (endpoint block) need to interpret the value. The
dma-buf get_tph callback itself is optional for workloads that depend
on the endpoint's in-flight operation that fallback does not produce
the same result.

The dma-buf hook is intentionally generic and discoverable rather than
a private side channel. The exporter owns the completing address
space for the dma-buf and decides whether it can provide a meaningful
ST/PH tuple for that completer; the dma-buf core keeps the tuple opaque,
and importers merely request the namespace they support and place the
returned value on generated TLPs. Exporters that cannot derive a
meaningful tuple simply return -EOPNOTSUPP.

Patch 1 is a pre-existing fix split out from the series:
mlx5_st_dealloc_index() removed the xarray entry but never freed the
backing struct, so repeated alloc/dealloc cycles leaked memory.
Patch 2 exposes the enabled TPH requester type through a small PCI/TPH
helper, plus a pcie_tph_supported() helper so consumers don't reach
into pci_dev internals (and so callers in CONFIG_PCIE_TPH=n builds
get a clean -EOPNOTSUPP path).
Patch 3 adds the optional dma_buf_ops::get_tph callback to the dma-buf
framework so importers can fetch TPH metadata from an exporter.
Patch 4 implements get_tph in vfio-pci and adds the new uAPI
(VFIO_DEVICE_FEATURE_DMA_BUF_TPH) for userspace to attach the metadata.
Patch 5 wires up the mlx5 RDMA driver as a consumer.

Build-tested with both CONFIG_PCIE_TPH=y and CONFIG_PCIE_TPH=n.
Functional validation on the target topology: PCIe analyzer captures
on the P2P TLPs confirm the ST emitted by mlx5 matches the value
published through VFIO_DEVICE_FEATURE_DMA_BUF_TPH, and the end-to-end
P2P workload only produces results consistent with the endpoint's
ST-selected in-flight operation. For example, with userspace
publishing 8-bit ST=0xf0 and PH=2, an analyzer capture of a peer-to-
peer MWr64 shows "STP MWr64 TC=0 OHC=2 ..." followed by "OHC-B
ST=F0h PH=2 HV=1":
(TLP Captures)
08000260 -> STP MWr64 TC=0 OHC=2 TS=0 Attr=0 L=8                                
                                                                       
F0000004 -> RID=4h:0h.0h EP- Tag=F0h                                            
                                                                       
E0200000 -> AddrH=000020E0h                                                     
                                                                       
00080006 -> AddrL=06000800h                                                     
                                                                       
90F00000 -> OHC-B ST=F0h PH=2 HV=1 AMA=0 AV-  

Previous link:
v5: 
https://lore.kernel.org/dri-devel/[email protected]/
v4: 
https://lore.kernel.org/linux-pci/[email protected]/
v3: 
https://lore.kernel.org/linux-pci/[email protected]/
v2: https://lore.kernel.org/linux-pci/[email protected]/

Zhiping Zhang (5):
  net/mlx5: free mlx5_st_idx_data on final dealloc
  PCI/TPH: expose the enabled TPH requester type and capability helpers
  dma-buf: add optional get_tph() callback
  vfio/pci: implement get_tph and DMA_BUF_TPH feature
  RDMA/mlx5: get tph for p2p access when registering dma-buf mr

 drivers/infiniband/core/frmr_pools.c          |  20 ++-
 drivers/infiniband/hw/mlx5/mr.c               | 124 +++++++++++++++++-
 .../net/ethernet/mellanox/mlx5/core/lib/st.c  |  50 +++++--
 drivers/pci/tph.c                             |  25 ++++
 drivers/vfio/pci/vfio_pci_core.c              |   3 +
 drivers/vfio/pci/vfio_pci_dmabuf.c            |  92 ++++++++++++-
 drivers/vfio/pci/vfio_pci_priv.h              |  12 ++
 include/linux/dma-buf.h                       |  31 +++++
 include/linux/mlx5/driver.h                   |  12 ++
 include/linux/pci-tph.h                       |   7 +
 include/rdma/frmr_pools.h                     |   5 +-
 include/uapi/linux/vfio.h                     |  45 +++++++
 12 files changed, 406 insertions(+), 20 deletions(-)

-- 
2.53.0-Meta

Reply via email to