On Tue, 24 Jun 2025, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Tue, 24 Jun 2025, Richard Sandiford wrote:
> >
> >> Richard Biener <rguent...@suse.de> writes:
> >> > On Tue, 24 Jun 2025, Richard Sandiford wrote:
> >> >
> >> >> Tamar Christina <tamar.christ...@arm.com> writes:
> >> >> > store_bit_field_1 has an optimization where if a target is not a 
> >> >> > memory operand
> >> >> > and the entire value is being set from something larger we can just 
> >> >> > wrap a
> >> >> > subreg around the source and emit a move.
> >> >> >
> >> >> > For vector constructors this is however problematic because the 
> >> >> > subreg means
> >> >> > that the expansion of the constructor won't happen through vec_init 
> >> >> > anymore.
> >> >> >
> >> >> > Complicated constructors which aren't natively supported by targets 
> >> >> > then ICE as
> >> >> > they wouldn't have been expanded so recog fails.
> >> >> >
> >> >> > This patch blocks the optimization on non-constant vector 
> >> >> > constructors. Or non-uniform
> >> >> > non-constant vectors. I allowed constant vectors because if I read 
> >> >> > the code right
> >> >> > simplify-rtx should be able to perform the simplification of pulling 
> >> >> > out the element
> >> >> > or merging the constant values.  There are several testcases in 
> >> >> > aarch64-sve-pcs.exp
> >> >> > that test this as well. I allowed uniform non-constant vectors 
> >> >> > because they
> >> >> > would be folded into a vec_select later on.
> >> >> >
> >> >> > Note that codegen is quite horrible, for what should only be an lsr.  
> >> >> > But I'll
> >> >> > address that separately so that this patch is backportable.
> >> >> >
> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> >> >> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> >> >> > -m32, -m64 and no issues.
> >> >> >
> >> >> > Ok for master? and GCC 15, 14, 13?
> >> >> 
> >> >> I was discussing this Alex off-list last week, and the fix we talked
> >> >> about there was:
> >> >> 
> >> >> diff --git a/gcc/explow.cc b/gcc/explow.cc
> >> >> index 7799a98053b..8b138f54f75 100644
> >> >> --- a/gcc/explow.cc
> >> >> +++ b/gcc/explow.cc
> >> >> @@ -753,7 +753,7 @@ force_subreg (machine_mode outermode, rtx op,
> >> >>               machine_mode innermode, poly_uint64 byte)
> >> >>  {
> >> >>    rtx x = simplify_gen_subreg (outermode, op, innermode, byte);
> >> >> -  if (x)
> >> >> +  if (x && (!SUBREG_P (x) || REG_P (SUBREG_REG (x))))
> >> >>      return x;
> >> >>  
> >> >>    auto *start = get_last_insn ();
> >> >> 
> >> >> The justification is that force_subreg is somewhat like a "subreg
> >> >> version of force_operand", and so should try to avoid returning
> >> >> subregs that force_operand would have replaced.  The force_operand
> >> >> code I mean is:
> >> >
> >> > Yeah, in particular CONSTANT_P isn't sth documented as valid as
> >> > subreg operands, only registers (and memory) are.  But isn't this
> >> > then a bug in simplify_gen_subreg itself, that it creates a SUBREG
> >> > of a non-REG/MEM?
> >> 
> >> I don't think the documentation is correct/up-to-date.  subreg is
> >> de facto used as a general operation, and for example there are
> >> patterns like:
> >> 
> >> (define_insn ""
> >>   [(set (match_operand:QI 0 "general_operand_dst" 
> >> "=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg")
> >>         (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" 
> >> "r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7")
> >>                                 (const_int 16)) 3))
> >>    (clobber (match_scratch:SI 2 "=&r,&r,&r,&r,&r,&r,&r,&r,&r"))
> >>    (clobber (reg:CC CC_REG))]
> >>   ""
> >>   "mov.w\\t%e1,%f2\;mov.b\\t%w2,%R0"
> >>   [(set_attr "length" "10")])
> >
> > I see.  Is the subreg for such define_insn generated by the middle-end
> > though?
> 
> I assume it was written to match something that combine could generate.
> Whether it still does in another question.
> 
> >> (from h8300).  This is also why simplify_gen_subreg has:
> >> 
> >>   if (GET_CODE (op) == SUBREG
> >>       || GET_CODE (op) == CONCAT
> >>       || GET_MODE (op) == VOIDmode)
> >>     return NULL_RTX;
> >> 
> >>   if (MODE_COMPOSITE_P (outermode)
> >>       && (CONST_SCALAR_INT_P (op)
> >>      || CONST_DOUBLE_AS_FLOAT_P (op)
> >>      || CONST_FIXED_P (op)
> >>      || GET_CODE (op) == CONST_VECTOR))
> >>     return NULL_RTX;
> >> 
> >> rather than the !REG_P (op) && !MEM_P (op) that the documentation
> >> would imply.
> >
> > So maybe we can drop the MODE_COMPOSITE_P check here, as said on IRC
> > we don't seem to ever legitmize constants wrapped in a SUBREG, so
> > we shouldn't generate a SUBREG of a constant (in the middle-end)?
> 
> Hmm, yeah, maybe.  I'd originally rejected that because I assumed
> the MODE_COMPOSITE_P was there for a reason.  But looking at the
> history, the check came from c0f772894b6b3cd8ed5c5dd09d0c7917f51cf70f,
> where the reason given was:
> 
>     As for the simplify_gen_subreg change, I think it would be desirable
>     to just avoid creating SUBREGs of constants on all targets and for all
>     constants, if simplify_immed_subreg simplified, fine, otherwise punt,
>     but as we are late in GCC11 development, the patch instead guards this
>     behavior on MODE_COMPOSITE_P (outermode) - i.e. only conversions to
>     powerpc{,64,64le} double double long double - and only for the cases where
>     simplify_immed_subreg was called.

So then I'd say we want to give

  if (CONSTANT_P (op))
    return NULL_RTX;

a try?

Richard.

Reply via email to