> >Performance was measured with l3fwd forwarding between two ports of an >Intel E810-XXV 2x25G NIC (1 RX queue per port). Two graph worker threads >ran on hyper threads of the same physical core on an Intel Xeon Silver >4316 CPU @ 2.30GHz. > >Results: >- Baseline (manual speculation): 37.0 Mpps >- Deferred API: 36.2 Mpps (-2.2%) >
On Octeon(Neoverse-n2) platform we see a slight performance increase ~1.5%. >The slight overhead comes from per-packet edge comparisons. However, >this is offset by: >- 826 fewer lines of code across 13 node implementations >- Reduced instruction cache pressure from simpler code paths >- Elimination of per-node speculation boilerplate >- Easier development of new nodes

