On Tue, 24 Jun 2025, Richard Sandiford wrote:

> Tamar Christina <tamar.christ...@arm.com> writes:
> > store_bit_field_1 has an optimization where if a target is not a memory 
> > operand
> > and the entire value is being set from something larger we can just wrap a
> > subreg around the source and emit a move.
> >
> > For vector constructors this is however problematic because the subreg means
> > that the expansion of the constructor won't happen through vec_init anymore.
> >
> > Complicated constructors which aren't natively supported by targets then 
> > ICE as
> > they wouldn't have been expanded so recog fails.
> >
> > This patch blocks the optimization on non-constant vector constructors. Or 
> > non-uniform
> > non-constant vectors. I allowed constant vectors because if I read the code 
> > right
> > simplify-rtx should be able to perform the simplification of pulling out 
> > the element
> > or merging the constant values.  There are several testcases in 
> > aarch64-sve-pcs.exp
> > that test this as well. I allowed uniform non-constant vectors because they
> > would be folded into a vec_select later on.
> >
> > Note that codegen is quite horrible, for what should only be an lsr.  But 
> > I'll
> > address that separately so that this patch is backportable.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Ok for master? and GCC 15, 14, 13?
> 
> I was discussing this Alex off-list last week, and the fix we talked
> about there was:
> 
> diff --git a/gcc/explow.cc b/gcc/explow.cc
> index 7799a98053b..8b138f54f75 100644
> --- a/gcc/explow.cc
> +++ b/gcc/explow.cc
> @@ -753,7 +753,7 @@ force_subreg (machine_mode outermode, rtx op,
>             machine_mode innermode, poly_uint64 byte)
>  {
>    rtx x = simplify_gen_subreg (outermode, op, innermode, byte);
> -  if (x)
> +  if (x && (!SUBREG_P (x) || REG_P (SUBREG_REG (x))))
>      return x;
>  
>    auto *start = get_last_insn ();
> 
> The justification is that force_subreg is somewhat like a "subreg
> version of force_operand", and so should try to avoid returning
> subregs that force_operand would have replaced.  The force_operand
> code I mean is:

Yeah, in particular CONSTANT_P isn't sth documented as valid as
subreg operands, only registers (and memory) are.  But isn't this
then a bug in simplify_gen_subreg itself, that it creates a SUBREG
of a non-REG/MEM?

Richard.

>   /* Check for subreg applied to an expression produced by loop optimizer.  */
>   if (code == SUBREG
>       && !REG_P (SUBREG_REG (value))
>       && !MEM_P (SUBREG_REG (value)))
>     {
>       value
>       = simplify_gen_subreg (GET_MODE (value),
>                              force_reg (GET_MODE (SUBREG_REG (value)),
>                                         force_operand (SUBREG_REG (value),
>                                                        NULL_RTX)),
>                              GET_MODE (SUBREG_REG (value)),
>                              SUBREG_BYTE (value));
>       code = GET_CODE (value);
>     }
> 
> Thanks,
> Richard
> 
> > Thanks,
> > Tamar
> >
> >
> > gcc/ChangeLog:
> >
> >     PR target/120718
> >     * expmed.cc (store_bit_field_1): Only push subreg over uniform vector
> >     constructors.
> >     (foldable_value_with_subreg): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> >     PR target/120718
> >     * gcc.target/aarch64/sve/pr120718.c: New test.
> >
> > ---
> >
> > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> > index 
> > be427dca5d9afeed2013954472dde3a5430169e0..a468aa5c0c3f20bd62a7afc1d245d64e87be5396
> >  100644
> > --- a/gcc/expmed.cc
> > +++ b/gcc/expmed.cc
> > @@ -740,6 +740,28 @@ store_bit_field_using_insv (const extraction_insn 
> > *insv, rtx op0,
> >    return false;
> >  }
> >  
> > +/* For non-constant vectors wrapping a subreg around the RTX will not make
> > +   the expression expand properly through vec_init.  For constant vectors
> > +   we can because simplification can just extract the element out by
> > +   by merging the values.  This can be done by simplify-rtx and so the
> > +   subreg will be eliminated.  However poly constants require vec_init as
> > +   they are a runtime value.  So only allow the subreg for simple integer
> > +   or floating point constants.  */
> > +
> > +static bool
> > +foldable_value_with_subreg (rtx value)
> > +{
> > +  if (GET_CODE (value) != CONST_VECTOR || const_vec_duplicate_p (value))
> > +    return true;
> > +
> > +  for (unsigned i = 0; i < const_vector_encoded_nelts (value); i++)
> > +    if (!CONST_INT_P (const_vector_elt (value, i))
> > +   && !CONST_DOUBLE_P (const_vector_elt (value, i)))
> > +      return false;
> > +
> > +  return true;
> > +}
> > +
> >  /* A subroutine of store_bit_field, with the same arguments.  Return true
> >     if the operation could be implemented.
> >  
> > @@ -795,7 +817,8 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> > poly_uint64 bitnum,
> >    /* If the target is a register, overwriting the entire object, or storing
> >       a full-word or multi-word field can be done with just a SUBREG.  */
> >    if (!MEM_P (op0)
> > -      && known_eq (bitsize, GET_MODE_BITSIZE (fieldmode)))
> > +      && known_eq (bitsize, GET_MODE_BITSIZE (fieldmode))
> > +      && foldable_value_with_subreg (value))
> >      {
> >        /* Use the subreg machinery either to narrow OP0 to the required
> >      words or to cope with mode punning between equal-sized modes.
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr120718.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/pr120718.c
> > new file mode 100644
> > index 
> > 0000000000000000000000000000000000000000..cb21d94792f0679a48cc20c3dcdf78c89c05a5c6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr120718.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +#include <arm_sve.h>
> > +typedef int __attribute__((vector_size(8))) v2si;
> > +typedef struct { int x; int y; } A;
> > +void bar(A a);
> > +void foo()
> > +{
> > +    A a;
> > +    *(v2si *)&a = (v2si){0, (int)svcntd_pat(SV_ALL)};
> > +    bar(a);
> > +}
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to