Hi Wathsala,

Looking forward to your reply.

Thanks

On 1/8/2026 8:30 AM, fengchengwen wrote:
> Hi Wathsala,
> 
> Sorry to ask if this patchset is under development or stopped?
> 
> PCIe Steer-tag provides a mechanism for precise data stash, which
> delivers a positive performance gain and is therefore a valuable
> feature I think.
> 
> This patchset concludes with the statement: "the PMDs should only
> enable TPH in device-specific mode", I don't think such restraints
> should be made, the framework should be compatible with various
> device capabilities:
> 1. The PCIe protocol defines two modes: one is the interrupt-vector
>    mode, and the other is the device-specific mode. A device may
>    choose to support either one or both.
> 2. If device support device-specific mode, it has a large degree of
>    freedom to implement, such as locate ST table in self-defined
>    place (just like '[PATCH v5 4/4] net/i40e: enable TPH in i40e'),
>    and also support only stash part of data (e.g. only desc or header
>    or even an offset data).
> 3. If device only support interrupt-vector mode (which each TLP will
>    use ST from an ST table entry), we could also support it, in this
>    framework, it could only report basic stash capability.
> 
> Thanks
> 
> On 6/3/2025 6:38 AM, Wathsala Vithanage wrote:
>> Today, DPDK applications benefit from Direct Cache Access (DCA) features
>> like Intel DDIO and Arm's write-allocate-to-SLC. However, those features
>> do not allow fine-grained control of direct cache access, such as
>> stashing packets into upper-level caches (L2 caches) of a processor or
>> the shared cache of a chiplet. PCIe TLP Processing Hints (TPH) addresses
>> this need in a vendor-agnostic manner. TPH capability has existed since
>> PCI Express Base Specification revision 3.0; today, numerous Network
>> Interface Cards and interconnects from different vendors support TPH
>> capability. TPH comprises a steering tag (ST) and a processing hint
>> (PH). ST specifies the cache level of a CPU at which the data should be
>> written to (or DCAed into), while PH is a hint provided by the PCIe
>> requester to the completer on an upcoming traffic pattern. Some NIC
>> vendors bundle TPH capability with fine-grained control over the type of
>> objects that can be stashed into CPU caches, such as
>>
>> - Rx/Tx queue descriptors
>> - Packet-headers
>> - Packet-payloads
>> - Data from a given offset from the start of a packet
>>
>> Note that stashable object types are outside the scope of the PCIe
>> standard; therefore, vendors could support any combination of the above
>> items as they see fit.
>>
>> To enable TPH and fine-grained packet stashing, this API extends the
>> ethdev library and the PCI bus driver. In this design, the application
>> provides hints to the PMD via the ethdev stashing API to indicate the
>> underlying hardware at which CPU and cache level it prefers a packet to
>> end up. Once the PMD receives a CPU and a cache-level combination (or a
>> list of such combinations), it must extract the matching ST from the PCI
>> bus driver for such combinations. The PCI bus driver implements the TPH
>> functions in an OS specific way; for Linux, it depends on the TPH
>> capabilities of the VFIO kernel driver.
>>
>> An application uses the cache stashing ethdev API by first calling the
>> rte_eth_dev_stashing_capabilities_get() function to find out what object
>> types can be stashed into a CPU cache by the NIC out of the object types
>> in the bulleted list above. This function takes a port_id and a pointer
>> to a uint16_t to report back the object type flags. PMD implements the
>> stashing_capabilities_get function pointer in eth_dev_ops. If the
>> underlying platform or the NIC does not support TPH, this function
>> returns -ENOTSUP, and the application should consider any values stored
>> in the object invalid.
>>
>> Once the application knows the supported object types that can be
>> stashed, the next step is to set the steering tags for the packets
>> associated with Rx and Tx queues via
>> rte_eth_dev_stashing_{rx,tx}_config_set() ethdev library functions. Both
>> functions have an identical signature, a port_id, a queue_id, and a
>> config object. The port_id and the queue_id are used to locate the
>> device and the queue. The config object is of type struct
>> rte_eth_stashing_config, which specifies the lcore_id and the
>> cache_level, indicating where objects from this queue should be stashed.
>> The 'objects' field in the config sets the types of objects the
>> application wishes to stash based on the capabilities found earlier.
>> Note that if the 'objects' field includes the flag
>> RTE_ETH_DEV_STASH_OBJECT_OFFSET, the 'offset' field must be used to set
>> the desired offset. These functions invoke PMD implementations of the
>> stashing functionality via the stashing_{rx,tx}_hints_set function
>> callbacks in the eth_dev_ops, respectively.
>>
>> The PMD's implementation of the stashing_rx_hints_set() and
>> stashing_tx_hints_set() functions is ultimately responsible for
>> extracting the ST via the API provided by the PCI bus driver. Before
>> extracting STs, the PMD should enable the TPH capability in the endpoint
>> device by calling the rte_pci_tph_enable() function.  The application
>> begins the ST extraction process by calling the rte_pci_tph_st_get()
>> function in drivers/bus/pci/rte_bus_pci.h, which returns STs via the
>> same rte_tph_info objects array passed into it as an argument.  Once PMD
>> acquires ST, the stashing_{rx,tx}_hints_set callbacks implemented in the
>> PMD are ready to set the ST as per the rte_eth_stashing_config object
>> passed to them by the higher-level ethdev functions
>> ret_eth_dev_stashing_{rx,tx}_hints(). As per the PCIe specification, STs
>> can be placed on the MSI-X tables or in a device-specific location. For
>> PMDs, setting the STs on queue contexts is the only viable way of using
>> TPH. Therefore, the PMDs should only enable TPH in device-specific mode.
>>
>> V4->V5:
>>  * Enable stashing-hints (TPH) in Intel i40e driver.
>>  * Update exported symbol version from 25.03 to 25.07.
>>  * Add TPH mode macros.
>>
>> V3->V4:
>>  * Add VFIO IOCTL based ST extraction mechanism to Linux PCI bus driver
>>  * Remove ST extraction via direct access to ACPI _DSM
>>  * Replace rte_pci_extract_tph_st() with rte_pci_tph_st_get() in PCI
>>    bus driver.
>>
>> Wathsala Vithanage (4):
>>   pci: add non-merged Linux uAPI changes
>>   bus/pci: introduce the PCIe TLP Processing Hints API
>>   ethdev: introduce the cache stashing hints API
>>   net/i40e: enable TPH in i40e
>>
>>  drivers/bus/pci/bsd/pci.c            |  43 +++++++
>>  drivers/bus/pci/bus_pci_driver.h     |  52 ++++++++
>>  drivers/bus/pci/linux/pci.c          | 100 ++++++++++++++++
>>  drivers/bus/pci/linux/pci_init.h     |  14 +++
>>  drivers/bus/pci/linux/pci_vfio.c     | 170 +++++++++++++++++++++++++++
>>  drivers/bus/pci/private.h            |   8 ++
>>  drivers/bus/pci/rte_bus_pci.h        |  67 +++++++++++
>>  drivers/bus/pci/windows/pci.c        |  43 +++++++
>>  drivers/net/intel/i40e/i40e_ethdev.c | 127 ++++++++++++++++++++
>>  kernel/linux/uapi/linux/vfio_tph.h   | 102 ++++++++++++++++
>>  lib/ethdev/ethdev_driver.h           |  66 +++++++++++
>>  lib/ethdev/rte_ethdev.c              | 149 +++++++++++++++++++++++
>>  lib/ethdev/rte_ethdev.h              | 158 +++++++++++++++++++++++++
>>  lib/pci/rte_pci.h                    |  15 +++
>>  14 files changed, 1114 insertions(+)
>>  create mode 100644 kernel/linux/uapi/linux/vfio_tph.h
>>
> 

Reply via email to