---------- Forwarded message --------- From: Toke Høiland-Jørgensen <[email protected]> Date: Thu, Dec 8, 2022 at 3:06 PM Subject: Re: [PATCH bpf-next v3 11/12] mlx5: Support RX XDP metadata To: Stanislav Fomichev <[email protected]>, <[email protected]> Cc: <[email protected]>, <[email protected]>, <[email protected]>, <[email protected]>, <[email protected]>, <[email protected]>, <[email protected]>, <[email protected]>, <[email protected]>, <[email protected]>, <[email protected]>, Saeed Mahameed <[email protected]>, David Ahern <[email protected]>, Jakub Kicinski <[email protected]>, Willem de Bruijn <[email protected]>, Jesper Dangaard Brouer <[email protected]>, Anatoly Burakov <[email protected]>, Alexander Lobakin <[email protected]>, Magnus Karlsson <[email protected]>, Maryam Tahhan <[email protected]>, <[email protected]>, <[email protected]>
Stanislav Fomichev <[email protected]> writes: > From: Toke Høiland-Jørgensen <[email protected]> > > Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe > pointer to the mlx5e_skb_from* functions so it can be retrieved from the > XDP ctx to do this. So I finally managed to get enough ducks in row to actually benchmark this. With the caveat that I suddenly can't get the timestamp support to work (it was working in an earlier version, but now timestamp_supported() just returns false). I'm not sure if this is an issue with the enablement patch, or if I just haven't gotten the hardware configured properly. I'll investigate some more, but figured I'd post these results now: Baseline XDP_DROP: 25,678,262 pps / 38.94 ns/pkt XDP_DROP + read metadata: 23,924,109 pps / 41.80 ns/pkt Overhead: 1,754,153 pps / 2.86 ns/pkt As per the above, this is with calling three kfuncs/pkt (metadata_supported(), rx_hash_supported() and rx_hash()). So that's ~0.95 ns per function call, which is a bit less, but not far off from the ~1.2 ns that I'm used to. The tests where I accidentally called the default kfuncs cut off ~1.3 ns for one less kfunc call, so it's definitely in that ballpark. I'm not doing anything with the data, just reading it into an on-stack buffer, so this is the smallest possible delta from just getting the data out of the driver. I did confirm that the call instructions are still in the BPF program bytecode when it's dumped back out from the kernel. -Toke -- This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz Dave Täht CEO, TekLibre, LLC _______________________________________________ LibreQoS mailing list [email protected] https://lists.bufferbloat.net/listinfo/libreqos
