Peter Bergner via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > The pr87507.c test case regressed due to Segher's commit that added > -fsplit-wide-types-early. The issue is that the lower-subreg pass only > decomposes the TImode code in the example if there is a pseudo reg to pseudo > reg copy. When the lower-subreg pass is called late (its old location), > then combine changes the generated code by adding a TImode pseudo reg to > pseudo reg copy and lower-subreg successfully decomposes it. > > When we run lower-subreg before combine, that copy isn't there so we > do not decompose our TImode uses. I'm not sure why we require a pseudo > to pseudo copy before we decompose things, but changing find_pseudo_copy > to allow pseudo and hard regs to be copied into a pseudo like below fixes > the issue. > > Ian, do you remember why we don't just decompose all wide types and instead > require a pseudo to pseudo copy to exist? > > This patch survived bootstrap and regtesting on powerpc64le-linux with > no regressions. > > * lower-subreg.c (find_pseudo_copy): Allow copies from hard registers > too. > > diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c > index 4c8bc835f93..c6816a34489 100644 > --- a/gcc/lower-subreg.c > +++ b/gcc/lower-subreg.c > @@ -419,7 +419,7 @@ find_pseudo_copy (rtx set) > > rd = REGNO (dest); > rs = REGNO (src); > - if (HARD_REGISTER_NUM_P (rd) || HARD_REGISTER_NUM_P (rs)) > + if (HARD_REGISTER_NUM_P (rd)) > return false; > > b = reg_copy_graph[rs];
I guess this would also work if we dropped the rd check instead. So how about s/||/&&/ instead, to avoid the assymetry? I agree something like this is a better fix long-term, since we shouldn't be relying on make_more_copies outside combine. > Given we're late in stage4, the above patch (assuming it's ok) probably > shouldn't go in until stage1, since it is changing lower-subreg's behaviour > slightly. > > A different (ie, safer) approach would be to just rerun lower-subreg at > its old location, regardless of whether we used -fsplit-wide-types-early. > This way, we are not changing lower-subreg's behaviour, just running it an > extra time (3 times instead of twice when using -fsplit-wide-types-early). > I don't think lower-subreg is too expensive to run an extra time and we'd > only do it when using -fsplit-wide-types-early. The only downside (if any) > is that we don't decompose these TImode uses early like the patch above does, > so combine, etc. can't see what they will eventually become. This does fix > the bug and also survives bootstrap and regtesting on powerpc64le-linux > with no regressions. > > * lower-subreg.c (pass_lower_subreg3::gate): Remove test for > flag_split_wide_types_early. > > diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c > index 4c8bc835f93..807ad398b64 100644 > --- a/gcc/lower-subreg.c > +++ b/gcc/lower-subreg.c > @@ -1844,8 +1844,7 @@ public: > {} > > /* opt_pass methods: */ > - virtual bool gate (function *) { return flag_split_wide_types > - && !flag_split_wide_types_early; } > + virtual bool gate (function *) { return flag_split_wide_types != 0; } > virtual unsigned int execute (function *) > { > decompose_multiword_subregs (true); Looks good to me with the s/ != 0// that Segher mentioned. With this change, the only remaining function of -fsplit-wide-types-early is to act as a double lock on one pass. IMO it'd make more sense to remove that double lock and make -fsplit-wide-types-early and -fsplit-wide-types act as independent options, a bit like -fschedule-insns{,2}. Thanks, Richard