On Thu, May 28, 2026 at 12:46 AM Christian König <[email protected]> wrote: > > > > On 5/28/26 06:55, Zhiping Zhang wrote: > > On Wed, May 27, 2026 at 5:53 AM Christian König > > <[email protected]> wrote: > >> > >>> > >> On 5/27/26 14:36, Jason Gunthorpe wrote: > >>> On Wed, May 27, 2026 at 02:23:46PM +0200, Christian König wrote: > >>> > >>>> Yeah that's a good point, I should probably rephrase the question. > >>>> > >>>> I'm aware of how TPH works by adding the extra ST to the TLP. > >>>> > >>>> But my question is how is that useful to a PCIe endpoint? What is the > >>>> effect of the ST here? > >>> > >>> TBH I've never heard Meta explain what their device is doing with > >>> it. At least it seems to be super important to their device.. > >> > >> Yeah I think at least a brief description of what is going on here would > >> be necessary for the review. > >> > >> Otherwise we have only the info that the exporter wants to give an opaque > >> ST for the importer to use and no technical description what that is good > >> for, how to test it etc... > >> > >> Regards, > >> Christian. > >> > >>> > >>> Jason > >> > > > > Fair point — I'll add a couple of paragraphs to the v6 cover letter and the > > patch's changelog as well. > > > > The short version: in this series the vfio-pci device is the completer > > of the P2P > > writes and mlx5 is the requester. As Jason noted, ST semantics on the > > completer > > are implementation-defined, so only the driver that owns the completer > > (here, > > vfio-pci on behalf of its userspace owner) can hand out a meaningful ST; the > > importer treats it as opaque and just places it on the TLP. > > Yeah but that is not really sufficient to justify a driver 2 driver interface. > > Which PF driver is backing the vfio-pci and what effect does sending TLPs > with ST to it compared to TLPs without an ST? > > Regards, > Christian. >
There's no in-tree vendor PF driver — the device is a Meta MTIA accelerator managed entirely from userspace via VFIO passthrough. That's why the ST has to flow through a uAPI: userspace owns the device and its ST table, so it's the only entity that can publish a meaningful value for a given dma-buf. On the effect: the endpoint's PCIe ingress block uses the 8-bit ST as an in-band instruction for the incoming P2P TLP — selecting a target cache partition and, on writes, an in-flight operation on the data before it lands. The dma-buf callback keeps this opaque to the framework — only the producer (userspace owner of the VFIO device) and the consumer (endpoint block) need to interpret the value. will include these words into v6's cover letter. Thanks, Zhiping
