"Roger Sayle" <[email protected]> writes:
> Sorry for the delay in replying.  Richard's e-mails from googlemail
> appear to end up in my spam folder.

Hope this one comes through OK :)

> Another reason for the delay is
> PR 120144/124313, which I'm a little surprised anyone working on MIPS
> wouldn't have noticed.

Yeah, the port seems to be in a bad way.

FTR, I haven't looked at MIPS in years.  And I'm not intending to pick
it back up now :)  I just noticed this change go by on gcc-patches and
wasn't sure about the MIPS-focused parts.

That said, now that I've seen the test results, I might try to tackle
some of the low-hanging fruit...

> But I agree with Richard that my MIPS change did result in a performance 
> regression, indeed the sequence of two shifts by 32 was shown in one of
> the examples of my post and the commit message.  The solution that I've
> been investigating looks like:
>
> diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
> index 6a1b63b00d2e..b84e671ee537 100644
> --- a/gcc/config/mips/mips.md
> +++ b/gcc/config/mips/mips.md
> @@ -3946,13 +3946,12 @@
>    [(set_attr "move_type" "signext,load")
>     (set_attr "mode" "SI")])
>
> -;; Combiner patterns for truncate/sign_extend combinations.  The SI
> versions
> -;; use the shift/truncate patterns.
> +;; Combiner patterns for truncate/sign_extend combinations.
>
>  (define_insn_and_split "*extenddi_truncate<mode>"
>    [(set (match_operand:DI 0 "register_operand" "=d")
>         (sign_extend:DI
> -           (truncate:SUBDI (match_operand:DI 1 "register_operand" "d"))))]
> +           (truncate:SHORT (match_operand:DI 1 "register_operand" "d"))))]
>    "TARGET_64BIT && !TARGET_MIPS16 && !ISA_HAS_EXTS"
>    "#"
>    "&& reload_completed"
> @@ -3969,6 +3968,16 @@
>    [(set_attr "move_type" "shift_shift")
>     (set_attr "mode" "DI")])
>
> +;; Like truncdisi2, but with semantics
> +(define_insn "*extenddi_truncatesi"
> +  [(set (match_operand:DI 0 "register_operand" "=d")
> +       (sign_extend:DI
> +           (truncate:SI (match_operand:DI 1 "register_operand" "d"))))]
> +  "TARGET_64BIT"
> +  "sll\t%0,%1,0"
> +  [(set_attr "move_type" "sll0")
> +   (set_attr "mode" "SI")])
> +
>  (define_insn_and_split "*extendsi_truncate<mode>"
>    [(set (match_operand:SI 0 "register_operand" "=d")
>         (sign_extend:SI
>
>
> Notice that this solution contains the explicit sign-extension, allowing it
> to
> be correctly optimized and folded by the middle-end.  The alternative, the
> current situation is that the sign-extension semantics are hidden in the RTL
> and exposed purely as !TRULY_NOOP_TRUNCATION.

Yeah, it's hidden until after RA.  But my argument is that it's still
hidden with the pattern above.  In general, any code generated for:

  A: (set (reg:SI tmp) src)
  B: (set (reg:DI target) (sign_extend:DI (reg:SI tmp)))

is the code for A plus what becomes a nop after RA.  The existence of B
doesn't change what A has to do.  That includes the particular case of a
(truncate:SI ...) src, but truncate isn't special in that respect.

That is, the code for:

  [(set (match_operand:DI 0 "register_operand" "=d")
        (sign_extend:DI
           (truncate:SI (match_operand:DI 1 "register_operand" "d"))))]

should be the same as the code for:

  [(set (match_operand:SI 0 "register_operand" "=d")
        (truncate:SI (match_operand:DI 1 "register_operand" "d")))]

Similarly for addition, etc.  But rather than say so for every operation,
the idea is that a standalone SI->DI sign extension generates no code.

While the RTL is operating on pseudo registers, the truncate and
the sign_extend rtxes need to be present, since removing them would
tend to allow invalid optimisations to be performed later.  (I think
we agree on that part.)  So we do still need an SI->DI sign_extend
at first.  The extendsidi2 pattern is therefore a shell that exists
for semantics but doesn't generate code for register sources.

> Targets such as nvptx are also non-noop truncation, but they don't
> perform sign or zero extension.

Right.  The same goes for 64-bit to 16-bit truncation on MIPS:
it's not a nop, but it doesn't guarantee extension from HI either.
The RTL optimisers must still honour TARGET_TRULY_NOOP_TRUNCATION
regardless of how the target defines mode_rep_extended.

> I'm also investigating whether the mode_rep_extended change is required...
> The problem is that the ABI/target requires/guarantees some invariants, but
> they are not automatically provided by the hardware, but instead must be
> maintained by the RTL optimizers.  Some passes can take advantage of these
> invariants, but other passes that change the RTL need to be aware of these
> invariants and ensure that they are preserved/restored.

mode_rep_extended itself isn't supposed to impose any new requirements
on the RTL optimisers.  It's just supposed to describe something that
flows naturally from requirements described elsewhere (in particular
TARGET_TRULY_NOOP_TRUNCATION).

The documentation could make that clearer.  But I think the MIPS
and old SH64 definitions did/do get it right.

The ABI stuff is described by other macros.

To put it another way: the fact that GCC can optimise things more
when mode_rep_extended == SIGN_EXTEND is the reason why that case
is particularly good at exposing missing TRUNCATEs.  As Andrew said
in one of the PRs, although the older MIPS ISA specs required inputs
to 32-bit instructions to be sign-extended, no MIPS hardware that
I know of actually trapped on invalid register inputs.  And almost
all MIPS hardware does the obvious thing.  (IIRC, SB-1 didn't sign-
extend the result if the inputs weren't sign-extended, but I don't
remember any others caring either way.)

So the only way that missing truncations tend to be visible is via
simulation traps or by mode_rep_extended optimisations.  But those
optimisations are the canary that show that a TRUNCATE is missing.

> But I'm happy to go along with whatever Richard Sandford recommends,
> access to real hardware trumps my fights with libgloss, combined trees and
> cross-builds [all of which are much less stable now than when I wrote the
> original correctness fix last year].

Although I probably do have an old MIPS board lying around somewhere,
it seemed like too much hassle to find and resurrect it :)  I ended up
using qemu to test this patch.

Thanks,
Richard

Reply via email to