This series adds TLP Processing Hints (TPH) support to the VFIO dma-buf
export path, allowing importing drivers (e.g. mlx5) to use the
exporter's steering tag when performing peer-to-peer DMA into a
VFIO-owned device.
There is no separate in-tree vendor kernel driver for the target device:
vfio-pci is the in-tree driver and the targeted device is managed
from userspace via VFIO passthrough. That is why the ST has to flow
through a uAPI: userspace owns the device and its ST table, so it is the
entity that can publish a meaningful value for a given dma-buf. The
kernel-visible participants are still in-tree: vfio-pci exports the
dma-buf and mlx5 imports it.
On the effect: the endpoint's PCIe ingress block uses the 8-bit ST as
an in-band instruction for the incoming P2P TLP -- selecting a target
cache partition and, on writes, an in-flight operation on the data
before it lands. The dma-buf callback keeps this opaque to the
framework -- only the producer (userspace owner of the VFIO device)
and the consumer (endpoint block) need to interpret the value. The
dma-buf get_tph callback itself is optional for workloads that depend
on the endpoint's in-flight operation that fallback does not produce
the same result.
The dma-buf hook is intentionally generic and discoverable rather than
a private side channel. The exporter owns the completing address
space for the dma-buf and decides whether it can provide a meaningful
ST/PH tuple for that completer; the dma-buf core keeps the tuple opaque,
and importers merely request the namespace they support and place the
returned value on generated TLPs. Exporters that cannot derive a
meaningful tuple simply return -EOPNOTSUPP.
Patch 1 is a pre-existing fix split out from the series:
mlx5_st_dealloc_index() removed the xarray entry but never freed the
backing struct, so repeated alloc/dealloc cycles leaked memory.
Patch 2 exposes the enabled TPH requester type through a small PCI/TPH
helper, plus a pcie_tph_supported() helper so consumers don't reach
into pci_dev internals (and so callers in CONFIG_PCIE_TPH=n builds
get a clean -EOPNOTSUPP path).
Patch 3 adds the optional dma_buf_ops::get_tph callback to the dma-buf
framework so importers can fetch TPH metadata from an exporter.
Patch 4 implements get_tph in vfio-pci and adds the new uAPI
(VFIO_DEVICE_FEATURE_DMA_BUF_TPH) for userspace to attach the metadata.
Patch 5 wires up the mlx5 RDMA driver as a consumer.
Build-tested with both CONFIG_PCIE_TPH=y and CONFIG_PCIE_TPH=n.
Functional validation on the target topology: PCIe analyzer captures
on the P2P TLPs confirm the ST emitted by mlx5 matches the value
published through VFIO_DEVICE_FEATURE_DMA_BUF_TPH, and the end-to-end
P2P workload only produces results consistent with the endpoint's
ST-selected in-flight operation. For example, with userspace
publishing 8-bit ST=0xf0 and PH=2, an analyzer capture of a peer-to-
peer MWr64 shows "STP MWr64 TC=0 OHC=2 ..." followed by "OHC-B
ST=F0h PH=2 HV=1":
(TLP Captures)
08000260 -> STP MWr64 TC=0 OHC=2 TS=0 Attr=0 L=8
F0000004 -> RID=4h:0h.0h EP- Tag=F0h
E0200000 -> AddrH=000020E0h
00080006 -> AddrL=06000800h
90F00000 -> OHC-B ST=F0h PH=2 HV=1 AMA=0 AV-
Previous link:
v5:
https://lore.kernel.org/dri-devel/[email protected]/
v4:
https://lore.kernel.org/linux-pci/[email protected]/
v3:
https://lore.kernel.org/linux-pci/[email protected]/
v2: https://lore.kernel.org/linux-pci/[email protected]/
Zhiping Zhang (5):
net/mlx5: free mlx5_st_idx_data on final dealloc
PCI/TPH: expose the enabled TPH requester type and capability helpers
dma-buf: add optional get_tph() callback
vfio/pci: implement get_tph and DMA_BUF_TPH feature
RDMA/mlx5: get tph for p2p access when registering dma-buf mr
drivers/infiniband/core/frmr_pools.c | 20 ++-
drivers/infiniband/hw/mlx5/mr.c | 124 +++++++++++++++++-
.../net/ethernet/mellanox/mlx5/core/lib/st.c | 50 +++++--
drivers/pci/tph.c | 25 ++++
drivers/vfio/pci/vfio_pci_core.c | 3 +
drivers/vfio/pci/vfio_pci_dmabuf.c | 92 ++++++++++++-
drivers/vfio/pci/vfio_pci_priv.h | 12 ++
include/linux/dma-buf.h | 31 +++++
include/linux/mlx5/driver.h | 12 ++
include/linux/pci-tph.h | 7 +
include/rdma/frmr_pools.h | 5 +-
include/uapi/linux/vfio.h | 45 +++++++
12 files changed, 406 insertions(+), 20 deletions(-)
--
2.53.0-Meta