On Tue, May 19, 2026 at 10:51:37AM +0300, Alexander Monakov wrote:
> > We don't use vpermilps insn for V4S[IF]mode variable permutations on
> > TARGET_AVX without TARGET_AVX512*.  For TARGET_AVX512* there are plenty
> > of permutation instructions already.  For TARGET_AVX2, the function has
> > special cases for one_operand_shuffle for V8SImode/V8SFmode and emits
> > reasonable code, but for V4SImode/V4SFmode with TARGET_AVX2 it handles
> > those using V8SImode/V8SFmode as two operand shuffle, which requires
> > 2 preparation instructions, vpermd and one finalization instruction.
> > And for !TARGET_AVX2 && TARGET_AVX we just emit terrible code for these.
> > 
> > So, the following patch uses vpermilps for V4S[IF]mode one_operand_shuffle.
> 
> Thanks for looking at the issue, I really appreciate it. The same problem
> exists with 64-bit lanes (V2DF/V2SI modes, we fail to utilize vpermilpd).

The control in that case is in bits 1 and 65 rather than 0 and 64.
So, in order to use vpermilpd for
__builtin_shuffle (v2di_or_v2df, v2di);
one would need to first shift the mask (or vpaddq with itself).
Though, that is still shorter than what we emit right now.

        Jakub

Reply via email to