On Tue, May 21, 2024 at 8:16 AM Haochen Jiang <haochen.ji...@intel.com> wrote:
>
> Hi all,
>
> Since vpermq is really slow, we should avoid using it when it is
> the only instruction could be used for ix86_expand_vecop_qihi2.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk?
>
> Thx,
> Haochen
>
> gcc/ChangeLog:
>
>         PR target/115069
>         * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
>         Do not enable the optimization when AVX512BW is not enabled.
> ---
>  gcc/config/i386/i386-expand.cc | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index a6132911e6a..f24c800bb4f 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -24323,6 +24323,11 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx 
> dest, rtx op1, rtx op2)
>    bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
>    bool uns_p = code != ASHIFTRT;
>
> +  /* vpermq is slow and we should not fall into the optimization when
> +     it is the only instruction to be selected.  */

Please rather say something like:

/* Without VPMOVWB (provided by AVX512BW ISA), the expansion uses the generic
permutation to merge the data back into the right place.  This
permutation results
in VPERMQ, which is slow, so better fall back to expand_vecop_qihi.  */

Uros.

> +  if (!TARGET_AVX512BW)
> +    return false;
> +
>    if ((qimode == V16QImode && !TARGET_AVX2)
>        || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
>        /* There are no V64HImode instructions.  */
> --
> 2.31.1
>

Reply via email to