Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi All,
>
> The current vector extract pattern can only extract from a vector when the
> position to extract is a multiple of the vector bitsize as a whole.
>
> That means extract something like a V2SI from a V4SI vector from position 32
> isn't possible as 32 is not a multiple of 64.  Ideally this optab should have
> worked on multiple of the element size, but too many targets rely on this
> semantic now.
>
> So instead add a new case which allows any extraction as long as the bit pos
> is a multiple of the element size.  We use a VEC_PERM to shuffle the elements
> into the bottom parts of the vector and then use a subreg to extract the 
> values
> out.  This now allows various vector operations that before were being
> decomposed into very inefficient scalar operations.
>
> NOTE: I added 3 testcases, I only fixed the 3rd one.
>
> The 1st one missed because we don't optimize VEC_PERM expressions into
> bitfields.  The 2nd one is missed because extract_bit_field only works on
> vector modes.  In this case the intermediate extract is DImode.
>
> On targets where the scalar mode is tiable to vector modes the extract should
> work fine.
>
> However I ran out of time to fix the first two and so will do so in GCC 14.
> For now this catches the case that my pattern now introduces more easily.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>       * expmed.cc (extract_bit_field_1): Add support for vector element
>       extracts.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.target/aarch64/ext_1.c: New.
>
> --- inline copy of patch -- 
> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> index 
> bab020c07222afa38305ef8d7333f271b1965b78..ffdf65210d17580a216477cfe4ac1598941ac9e4
>  100644
> --- a/gcc/expmed.cc
> +++ b/gcc/expmed.cc
> @@ -1718,6 +1718,45 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> poly_uint64 bitnum,
>             return target;
>           }
>       }
> +      else if (!known_eq (bitnum, 0U)
> +            && multiple_p (GET_MODE_UNIT_BITSIZE (tmode), bitnum, &pos))
> +     {
> +       /* The encoding has a single stepped pattern.  */
> +       poly_uint64 nunits = GET_MODE_NUNITS (new_mode);
> +       int nelts = nunits.to_constant ();
> +       vec_perm_builder sel (nunits, nelts, 1);
> +       int delta = -pos.to_constant ();
> +       for (int i = 0; i < nelts; ++i)
> +         sel.quick_push ((i - delta) % nelts);
> +       vec_perm_indices indices (sel, 1, nunits);

Thanks for doing this, looks good.  But I don't think the to_constant
calls are safe.  new_mode and pos could in principle be nonconstant.

To build a stepped pattern, we just need:

    vec_perm_builder sel (nunits, 1, 3);

and then push pos, pos + 1, and pos + 2 to it.  There's no need to
clamp the position to nelts, it happens automatically.

> +
> +       if (can_vec_perm_const_p (new_mode, new_mode, indices, false))
> +         {
> +           class expand_operand ops[4];
> +           machine_mode outermode = new_mode;
> +           machine_mode innermode = tmode;
> +           enum insn_code icode
> +             = direct_optab_handler (vec_perm_optab, outermode);
> +           target = gen_reg_rtx (outermode);
> +           if (icode != CODE_FOR_nothing)
> +             {
> +               rtx sel = vec_perm_indices_to_rtx (outermode, indices);
> +               create_output_operand (&ops[0], target, outermode);
> +               ops[0].target = 1;
> +               create_input_operand (&ops[1], op0, outermode);
> +               create_input_operand (&ops[2], op0, outermode);
> +               create_input_operand (&ops[3], sel, outermode);

I think this should be GET_MODE (sel).  Looks like the current
version would ICE for float vectors.  That said...

> +               if (maybe_expand_insn (icode, 4, ops))
> +                 return simplify_gen_subreg (innermode, target, outermode, 
> 0);
> +             }
> +           else if (targetm.vectorize.vec_perm_const != NULL)
> +             {
> +               if (targetm.vectorize.vec_perm_const (outermode, outermode,
> +                                                     target, op0, op0, 
> indices))
> +                 return simplify_gen_subreg (innermode, target, outermode, 
> 0);
> +             }

...can we use expand_vec_perm_const here?  It will try the constant
expansion first, which is the preferred order.  It also has a few
variations up its sleeve.

Thanks,
Richard


> +         }
> +     }
>      }
>  
>    /* See if we can get a better vector mode before extracting.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c 
> b/gcc/testsuite/gcc.target/aarch64/ext_1.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..18a10a14f1161584267a8472e571b3bc2ddf887a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ext_1.c
> @@ -0,0 +1,54 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#include <string.h>
> +
> +typedef unsigned int v4si __attribute__((vector_size (16)));
> +typedef unsigned int v2si __attribute__((vector_size (8)));
> +
> +/*
> +** extract: { xfail *-*-* }
> +**   ext     v0.16b, v0.16b, v0.16b, #4
> +**   ret
> +*/
> +v2si extract (v4si x)
> +{
> +    v2si res = {x[1], x[2]};
> +    return res;
> +}
> +
> +/*
> +** extract1: { xfail *-*-* }
> +**   ext     v0.16b, v0.16b, v0.16b, #4
> +**   ret
> +*/
> +v2si extract1 (v4si x)
> +{
> +    v2si res;
> +    memcpy (&res, ((int*)&x)+1, sizeof(res));
> +    return res;
> +}
> +
> +typedef struct cast {
> +  int a;
> +  v2si b __attribute__((packed));
> +} cast_t;
> +
> +typedef union Data {
> +   v4si x;
> +   cast_t y;
> +} data;  
> +
> +/*
> +** extract2:
> +**   ext     v0.16b, v0.16b, v0.16b, #4
> +**   ret
> +*/
> +v2si extract2 (v4si x)
> +{
> +    data d;
> +    d.x = x;
> +    return d.y.b;
> +}
> +

Reply via email to