RE: [PATCH 1/2] net: ethernet address comparison optimizations

Morten Brørup Fri, 30 Jan 2026 03:16:59 -0800

> From: Bruce Richardson [mailto:[email protected]]
> Sent: Friday, 30 January 2026 11.53
> 
> On Fri, Jan 30, 2026 at 10:46:16AM +0000, Morten Brørup wrote:
> > For CPU architectures without strict alignment requirements,
> operations on
> > 6-byte Ethernet addresses using three 2-byte operations were replaced
> by a
> > 4-byte and a 2-byte operation, i.e. two operations instead of three.
> >
> > Comparison functions are pure, so added __rte_pure.
> >
> > Removed superfluous parentheses. (No functional change.)
> >
> > Signed-off-by: Morten Brørup <[email protected]>
> > ---
> >  lib/net/rte_ether.h | 19 ++++++++++++++++++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/net/rte_ether.h b/lib/net/rte_ether.h
> > index c9a0b536c3..5552d3c1f6 100644
> > --- a/lib/net/rte_ether.h
> > +++ b/lib/net/rte_ether.h
> > @@ -99,13 +99,19 @@ static_assert(alignof(struct rte_ether_addr) ==
> 2,
> >   *  True  (1) if the given two ethernet address are the same;
> >   *  False (0) otherwise.
> >   */
> > +__rte_pure
> >  static inline int rte_is_same_ether_addr(const struct rte_ether_addr
> *ea1,
> >                                  const struct rte_ether_addr *ea2)
> >  {
> > +#if !defined(RTE_ARCH_STRICT_ALIGN)
> > +   return ((((const unaligned_uint32_t *)ea1)[0] ^ ((const
> unaligned_uint32_t *)ea2)[0]) |
> > +                   (((const uint16_t *)ea1)[2] ^ ((const uint16_t
> *)ea2)[2])) == 0;
> > +#else
> >     const uint16_t *w1 = (const uint16_t *)ea1;
> >     const uint16_t *w2 = (const uint16_t *)ea2;
> >
> >     return ((w1[0] ^ w2[0]) | (w1[1] ^ w2[1]) | (w1[2] ^ w2[2])) ==
> 0;
> > +#endif
> >  }
> 
> Is this actually faster?


It's a simple micro-optimization, so I haven't benchmarked it.
On x86, the compiled function is simplified and reduced in size from 34 to 24 
bytes:

00000000004ed650 <review_rte_is_same_ether_addr>:
  4ed650:       0f b7 07                movzwl (%rdi),%eax
  4ed653:       0f b7 57 02             movzwl 0x2(%rdi),%edx
  4ed657:       66 33 06                xor    (%rsi),%ax
  4ed65a:       66 33 56 02             xor    0x2(%rsi),%dx
  4ed65e:       09 d0                   or     %edx,%eax
  4ed660:       0f b7 57 04             movzwl 0x4(%rdi),%edx
  4ed664:       66 33 56 04             xor    0x4(%rsi),%dx
  4ed668:       66 09 d0                or     %dx,%ax
  4ed66b:       0f 94 c0                sete   %al
  4ed66e:       0f b6 c0                movzbl %al,%eax
  4ed671:       c3                      ret
  4ed672:       66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
  4ed679:       00 00 00 00 
  4ed67d:       0f 1f 00                nopl   (%rax)

00000000004ed680 <rte_is_same_ether_addr_improved>:
  4ed680:       0f b7 47 04             movzwl 0x4(%rdi),%eax
  4ed684:       66 33 46 04             xor    0x4(%rsi),%ax
  4ed688:       8b 17                   mov    (%rdi),%edx
  4ed68a:       33 16                   xor    (%rsi),%edx
  4ed68c:       0f b7 c0                movzwl %ax,%eax
  4ed68f:       09 c2                   or     %eax,%edx
  4ed691:       0f 94 c0                sete   %al
  4ed694:       0f b6 c0                movzbl %al,%eax
  4ed697:       c3                      ret
  4ed698:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  4ed69f:       00

For reference, memcpy() of 6 bytes (compile time constant) also compiles to a 
4-byte and a 2-byte operation, not three 2-byte operations.

> For architectures that support strict alignment,
> this looks like something that the compilers should be doing using
> proper
> cost-benefit evaluation based on target architecture, rather than us
> doing
> it in our code.

I agree with the high level message in your comment.
DPDK contains some manual optimizations from back in the days, and the 
evolvement of compilers have made some of them obsolete.

In this case, GCC doesn't optimize it, so I did it manually.
I haven't checked if other compilers are clever enough to do it.

RE: [PATCH 1/2] net: ethernet address comparison optimizations

Reply via email to