On Mon, Jun 29, 2026 at 1:03 PM Stephen Hemminger <[email protected]> wrote: > > On Mon, 29 Jun 2026 12:10:27 -0700 > Joshua Washington <[email protected]> wrote: > > > On Mon, Jun 29, 2026 at 10:38 AM Stephen Hemminger > > <[email protected]> wrote: > > > > > > On Sun, 28 Jun 2026 17:50:03 -0700 > > > Mark Blasko <[email protected]> wrote: > > > > > > > In AF_PACKET, the PMD reads the timestamp from socket ring headers > > > > because the kernel stack processes the packet and allocates a socket > > > > buffer. Since the kernel stack and socket buffer allocation are bypassed > > > > in AF_XDP, UMEM frames are given directly to the PMD, which > > > > means the kernel never generates these socket headers. Also, it > > > > looks like the TAP PMD doesn't support RX timestamps. > > > > > > > > Since the location of metadata in the UMEM headroom is determined > > > > by the XDP program, the current driver implementation is coupled to > > > > a specific XDP program layout. As an alternative, we could just > > > > plumb the entire metadata headroom to the DPDK application and let > > > > the application parse it based on whatever XDP program is in use. > > > > That would couple the application and the XDP program, but would > > > > at least decouple the driver and the XDP program. > > > > > > > > > Sorry if I was not clear enough before. > > > The DPDK PMD provides an abstraction to applications to avoid exposing > > > as many details as possible. Whenever possible a PMD should follow > > > precedent and implement functions in a manner similar to other drivers. > > > > > > The method of expressing received timestamps was never well described > > > in DPDK documentation. The convention is: > > > - A dynamic field in mbuf is used for the timestamp. > > > - All drivers using timestamp should register the same field > > > using rte_mbuf_dyn_rx_timestamp_register. > > > - The mbuf field is filled inside the rx_burst processing. > > > - The timestamp dynamic field is a 64 bit unsigned number rolling > > > clock value. > > > - A PMD providing timestamp, must also define a readclock ethdev dev ops > > > so that application can compute the number of ticks in timestamp per > > > time interval. > > > - A PMD providing timestamp must advertise that in offload flags. > > > - Timestamp should only be inserted if the Rx timestamp offload flag > > > is set during configuration. > > > > These are all very good points for ensuring an airtight abstraction. > > However, I'd like to note that neither of the kernel-bound PMD > > interfaces (pcap, AF_PACKET) have support for the `read_clock` op. I > > am not certain of the reason for this, but I suspect it's because > > ethtool IOCTLs already cover that functionality. This patch's > > implementation follows the other implementations pretty closely, > > differing only in how the timestamp is extracted. > > > > > > > > When I read this was a little confused about meta data in mbuf. > > > Would have been clearer with a simple helper: > > > > > > static inline void > > > af_xdp_rx_timestamp(struct rte_mbuf *m) > > > { > > > const struct af_xdp_rx_metadata *meta > > > = rte_pktmbuf_mtod_offset(m, struct af_xdp_rx_metadata, > > > -(int)sizeof(*meta)); > > > > > > *RTE_MBUF_DYNFIELD(m, timestamp_dynfield_offset, uint64_t *) = > > > meta->rx_timestamp; > > > m->ol_flags |= timestamp_dynflag; > > > } > > > > > > If you are going to do Rx timestamp then a simple readclock ethdev op > > > is also needed to tell the application what the units are. > > > > > > But there are bigger issues with this patch: > > > 1. The patch assumes metadata is always present in XDP receive but device > > > used by XDP may not implement it. There is no capability checking. > > > The DPDK PMD ends up advertising RTE_ETH_RX_OFFLOAD_TIMESTAMP > > > unconditionally even if underling kernel device doesn't do it. > > > > The patch as submitted is incorrect; my apologies for that. I missed > > it in review. The layout of XDP metadata is something which is > > determined by the XDP program and consumed by an AF_XDP application > > tailor-made for the XDP program. This means that the PMD itself would > > not be able to process the RX timestamp from an XDP program on its > > own; that functionality would have to be handled by the DPDK > > application. > > > > > > > > 2. The format of metadata is not a documented contract between kernel > > > device implementing XDP and the DPDK. > > > > This is true. The contract lies between the XDP program and the AF_XDP > > application, which can both be controlled by the DPDK application and > > its invocation. > > > > > > > > 3. The XDP documentation says you need to check for metadata > > > in each frame. > > > > > > 4. There are no head bounds checks; must check that there is > > > packet headroom is configured with enough space for metadata. > > > > > > I am sure AI will find several more things but need fixing > > > before ready to merge. > > > > > > > Unfortunately, I think that some amount of abstraction leakage is > > unavoidable by the very nature of AF_XDP. What I propose is as > > follows: > > > > Introduce a new PMD capability for processing XDP metadata. The mbuf > > dyn_fields can store a pointer to the metadata and the size of the > > metadata. The DPDK application processes the metadata as it sees fit. > > As mentioned before, both the reader and the writer of the XDP > > metadata are both controllable, and this would add the flexiblity for > > one to use pre-existing DPDK mbuf infrastructure with minor > > modifications to support this functionality, rather than introducing a > > full rewrite using an AF_XDP application. > > > > Would this be acceptable? > > There needs to be an API that XDP PMD can call to check if underlying > driver will add metadata; and another API to check that buffer has > valid timestamp
I don't think the API to check if the driver has metadata support exists, unfortunately. bpf_xdp_adjust_meta() is used to by XDP programs to adjust the metadata pointer on a packet. If xdp_md->data_meta > xdp_md->data then metadata isn't supported, and the adjust call will fail. Otherwise, drivers have up to 256B of metadata. What this means for the PMD is that it can read data_meta and data pointers to decide if the metadata is valid on a per-packet basis, and pass the metadata to the application when appropriate. We were thinking to implement this via the vdev parameters passed in as part of invoking the application. This solution would introduce three new parameters: 1. `xdp_meta_valid_hint_offset` (byte offset of RX validity flag(s)) 2. `xdp_meta_rx_ts_valid_mask` (mask to read if the RX timesamp is valid. Mask covers a single byte.) 3. `xdp_meta_rx_ts_offset` (offset of RX timestamp in XDP metadata. RX timestamp is assumed to be 8 bytes by kernel convention.) modeled after xdp_metadata.h in the kernel. This solution would be quite verbose, so I'm open to suggested improvements. If the valid mask/hint offset are not passed in the invocation parameters, the timestamp would be assumed valid if the metadata is valid and has enough bytes available to feasibly read the timestamp value at the intended offset.

