Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

Andi Kleen Tue, 12 Aug 2025 22:01:13 -0700

> > It might be reasonable to tweak the costs per CPU however, I haven't
> > done that.
> >
> > BTW for rotate the wins are much higher because there are no native
> > instructions for it.
> For ashl/lshr, the original implementation only takes 2
> instructions(vpsllw/vpsrlw + vpand), and for ashr when shift count is


But two registers. Shorter dependency chains are usually better. 
Also this case is faster in my micros using GFNI, although not by much
(1-4%)

> 7, it only takes 1 instruction(vpcmpgtb).  .i.e
> https://godbolt.org/z/Wef97YqGx

Makes sense.

-Andi

Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

Reply via email to