On Tue, 24 Jun 2025, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Tue, 24 Jun 2025, Richard Sandiford wrote: > > > >> Richard Biener <rguent...@suse.de> writes: > >> > On Tue, 24 Jun 2025, Richard Sandiford wrote: > >> > > >> >> Tamar Christina <tamar.christ...@arm.com> writes: > >> >> > store_bit_field_1 has an optimization where if a target is not a > >> >> > memory operand > >> >> > and the entire value is being set from something larger we can just > >> >> > wrap a > >> >> > subreg around the source and emit a move. > >> >> > > >> >> > For vector constructors this is however problematic because the > >> >> > subreg means > >> >> > that the expansion of the constructor won't happen through vec_init > >> >> > anymore. > >> >> > > >> >> > Complicated constructors which aren't natively supported by targets > >> >> > then ICE as > >> >> > they wouldn't have been expanded so recog fails. > >> >> > > >> >> > This patch blocks the optimization on non-constant vector > >> >> > constructors. Or non-uniform > >> >> > non-constant vectors. I allowed constant vectors because if I read > >> >> > the code right > >> >> > simplify-rtx should be able to perform the simplification of pulling > >> >> > out the element > >> >> > or merging the constant values. There are several testcases in > >> >> > aarch64-sve-pcs.exp > >> >> > that test this as well. I allowed uniform non-constant vectors > >> >> > because they > >> >> > would be folded into a vec_select later on. > >> >> > > >> >> > Note that codegen is quite horrible, for what should only be an lsr. > >> >> > But I'll > >> >> > address that separately so that this patch is backportable. > >> >> > > >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu, > >> >> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > >> >> > -m32, -m64 and no issues. > >> >> > > >> >> > Ok for master? and GCC 15, 14, 13? > >> >> > >> >> I was discussing this Alex off-list last week, and the fix we talked > >> >> about there was: > >> >> > >> >> diff --git a/gcc/explow.cc b/gcc/explow.cc > >> >> index 7799a98053b..8b138f54f75 100644 > >> >> --- a/gcc/explow.cc > >> >> +++ b/gcc/explow.cc > >> >> @@ -753,7 +753,7 @@ force_subreg (machine_mode outermode, rtx op, > >> >> machine_mode innermode, poly_uint64 byte) > >> >> { > >> >> rtx x = simplify_gen_subreg (outermode, op, innermode, byte); > >> >> - if (x) > >> >> + if (x && (!SUBREG_P (x) || REG_P (SUBREG_REG (x)))) > >> >> return x; > >> >> > >> >> auto *start = get_last_insn (); > >> >> > >> >> The justification is that force_subreg is somewhat like a "subreg > >> >> version of force_operand", and so should try to avoid returning > >> >> subregs that force_operand would have replaced. The force_operand > >> >> code I mean is: > >> > > >> > Yeah, in particular CONSTANT_P isn't sth documented as valid as > >> > subreg operands, only registers (and memory) are. But isn't this > >> > then a bug in simplify_gen_subreg itself, that it creates a SUBREG > >> > of a non-REG/MEM? > >> > >> I don't think the documentation is correct/up-to-date. subreg is > >> de facto used as a general operation, and for example there are > >> patterns like: > >> > >> (define_insn "" > >> [(set (match_operand:QI 0 "general_operand_dst" > >> "=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg") > >> (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" > >> "r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7") > >> (const_int 16)) 3)) > >> (clobber (match_scratch:SI 2 "=&r,&r,&r,&r,&r,&r,&r,&r,&r")) > >> (clobber (reg:CC CC_REG))] > >> "" > >> "mov.w\\t%e1,%f2\;mov.b\\t%w2,%R0" > >> [(set_attr "length" "10")]) > > > > I see. Is the subreg for such define_insn generated by the middle-end > > though? > > I assume it was written to match something that combine could generate. > Whether it still does in another question. > > >> (from h8300). This is also why simplify_gen_subreg has: > >> > >> if (GET_CODE (op) == SUBREG > >> || GET_CODE (op) == CONCAT > >> || GET_MODE (op) == VOIDmode) > >> return NULL_RTX; > >> > >> if (MODE_COMPOSITE_P (outermode) > >> && (CONST_SCALAR_INT_P (op) > >> || CONST_DOUBLE_AS_FLOAT_P (op) > >> || CONST_FIXED_P (op) > >> || GET_CODE (op) == CONST_VECTOR)) > >> return NULL_RTX; > >> > >> rather than the !REG_P (op) && !MEM_P (op) that the documentation > >> would imply. > > > > So maybe we can drop the MODE_COMPOSITE_P check here, as said on IRC > > we don't seem to ever legitmize constants wrapped in a SUBREG, so > > we shouldn't generate a SUBREG of a constant (in the middle-end)? > > Hmm, yeah, maybe. I'd originally rejected that because I assumed > the MODE_COMPOSITE_P was there for a reason. But looking at the > history, the check came from c0f772894b6b3cd8ed5c5dd09d0c7917f51cf70f, > where the reason given was: > > As for the simplify_gen_subreg change, I think it would be desirable > to just avoid creating SUBREGs of constants on all targets and for all > constants, if simplify_immed_subreg simplified, fine, otherwise punt, > but as we are late in GCC11 development, the patch instead guards this > behavior on MODE_COMPOSITE_P (outermode) - i.e. only conversions to > powerpc{,64,64le} double double long double - and only for the cases where > simplify_immed_subreg was called.
So then I'd say we want to give if (CONSTANT_P (op)) return NULL_RTX; a try? Richard.