On Mon, May 30, 2022 at 12:42 PM Bruce Richardson
<bruce.richard...@intel.com> wrote:
>
> On Mon, May 30, 2022 at 10:00:34AM +0200, Morten Brørup wrote:
> > > From: Bruce Richardson [mailto:bruce.richard...@intel.com]
> > > Sent: Monday, 30 May 2022 09.52
> > >
> > > On Fri, May 27, 2022 at 01:15:20PM -0700, Stephen Hemminger wrote:
> > > > On Fri, 27 May 2022 20:18:22 +0200
> > > > Stanislaw Kardach <k...@semihalf.com> wrote:
> > > >
> > > > > +static inline void
> > > > > +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t
> > > hop[4],
> > > > > +               uint32_t defv)
> > > > > +{
> > > > > +       uint32_t nh;
> > > > > +       int i, ret;
> > > > > +
> > > > > +       for (i = 0; i < 4; i++) {
> > > > > +               ret = rte_lpm_lookup(lpm, ((rte_xmm_t)ip).u32[i], 
> > > > > &nh);
> > > > > +               hop[i] = (ret == 0) ? nh : defv;
> > > > > +       }
> > > > > +}
> > > >
> > > > For performance, manually unroll the loop.
> > >
> > > Given a constant 4x iterations, will compilers not unroll this
> > > automatically. I think the loop is a little clearer if it can be kept
> > >
> > > /Bruce
> >
> > If in doubt, add this and look at the assembler output:
> >
> > #define REVIEW_INLINE_FUNCTIONS 1
> >
> > #if REVIEW_INLINE_FUNCTIONS /* For compiler output review purposes only. */
> > #pragma GCC diagnostic push
> > #pragma GCC diagnostic ignored "-Wmissing-prototypes"
> > void review_rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t 
> > hop[4], uint32_t defv)
> > {
> >       rte_lpm_lookupx4(lpm, ip, hop, defv);
> > }
> > #pragma GCC diagnostic pop
> > #endif /* REVIEW_INLINE_FUNCTIONS */
> >
>
> Used godbolt.org to check and indeed the function is not unrolled.
> (Gcc 11.2, with flags "-O3 -march=icelake-server").
>
> Manually unrolling changes the assembly generated in interesting ways. For
> example, it appears to generate more cmov-type instructions for the
> miss/default-value case rather than using branches as in the looped
> version. Whether this is better or not may depend upon usecase - if one
> expects most lpm lookup entries to hit, then having (predictable) branches
> may well be cheaper.
>
> In any case, I'll withdraw any object to unrolling, but I'm still not
> convinced it's necessary.
>
> /Bruce
Interestingly enough until I've defined unlikely() in godbolt, I did
not get any automatic unrolling on godbolt (either with x86 or RISC-V
GCC). Did you get any compilation warnings?
That said it only happens on O3 since it implies -fpeel-loops. O3 is
the default for DPDK.

Reply via email to