Peter Bergner via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> The pr87507.c test case regressed due to Segher's commit that added
> -fsplit-wide-types-early.  The issue is that the lower-subreg pass only
> decomposes the TImode code in the example if there is a pseudo reg to pseudo
> reg copy.  When the lower-subreg pass is called late (its old location),
> then combine changes the generated code by adding a TImode pseudo reg to
> pseudo reg copy and lower-subreg successfully decomposes it.
>
> When we run lower-subreg before combine, that copy isn't there so we
> do not decompose our TImode uses.  I'm not sure why we require a pseudo
> to pseudo copy before we decompose things, but changing find_pseudo_copy
> to allow pseudo and hard regs to be copied into a pseudo like below fixes
> the issue.
>
> Ian, do you remember why we don't just decompose all wide types and instead
> require a pseudo to pseudo copy to exist?
>
> This patch survived bootstrap and regtesting on powerpc64le-linux with
> no regressions.
>
>       * lower-subreg.c (find_pseudo_copy): Allow copies from hard registers
>       too.
>
> diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c
> index 4c8bc835f93..c6816a34489 100644
> --- a/gcc/lower-subreg.c
> +++ b/gcc/lower-subreg.c
> @@ -419,7 +419,7 @@ find_pseudo_copy (rtx set)
>  
>    rd = REGNO (dest);
>    rs = REGNO (src);
> -  if (HARD_REGISTER_NUM_P (rd) || HARD_REGISTER_NUM_P (rs))
> +  if (HARD_REGISTER_NUM_P (rd))
>      return false;
>  
>    b = reg_copy_graph[rs];

I guess this would also work if we dropped the rd check instead.
So how about s/||/&&/ instead, to avoid the assymetry?

I agree something like this is a better fix long-term, since we
shouldn't be relying on make_more_copies outside combine.

> Given we're late in stage4, the above patch (assuming it's ok) probably
> shouldn't go in until stage1, since it is changing lower-subreg's behaviour
> slightly.
>
> A different (ie, safer) approach would be to just rerun lower-subreg at
> its old location, regardless of whether we used -fsplit-wide-types-early.
> This way, we are not changing lower-subreg's behaviour, just running it an
> extra time (3 times instead of twice when using -fsplit-wide-types-early).
> I don't think lower-subreg is too expensive to run an extra time and we'd
> only do it when using -fsplit-wide-types-early.  The only downside (if any)
> is that we don't decompose these TImode uses early like the patch above does,
> so combine, etc. can't see what they will eventually become.  This does fix
> the bug and also survives bootstrap and regtesting on powerpc64le-linux
> with no regressions.
>
>       * lower-subreg.c (pass_lower_subreg3::gate): Remove test for
>       flag_split_wide_types_early.
>
> diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c
> index 4c8bc835f93..807ad398b64 100644
> --- a/gcc/lower-subreg.c
> +++ b/gcc/lower-subreg.c
> @@ -1844,8 +1844,7 @@ public:
>    {}
>  
>    /* opt_pass methods: */
> -  virtual bool gate (function *) { return flag_split_wide_types
> -                                       && !flag_split_wide_types_early; }
> +  virtual bool gate (function *) { return flag_split_wide_types != 0; }
>    virtual unsigned int execute (function *)
>      {
>        decompose_multiword_subregs (true);

Looks good to me with the s/ != 0// that Segher mentioned.

With this change, the only remaining function of -fsplit-wide-types-early
is to act as a double lock on one pass.  IMO it'd make more sense to remove
that double lock and make -fsplit-wide-types-early and -fsplit-wide-types
act as independent options, a bit like -fschedule-insns{,2}.

Thanks,
Richard

Reply via email to