Hi Uros,

I've tried your suggestions to see what would happen.
Alas, allowing both operands to (i386's) widening multiplications
to be  nonimmediate_operand results in 90 additional testsuite
unexpected failures", and 41 unresolved testcase, around things
like:

gcc.c-torture/compile/di.c:6:1: error: unrecognizable insn:
(insn 14 13 15 2 (parallel [
            (set (reg:DI 98 [ _3 ])
                (mult:DI (zero_extend:DI (mem/c:SI (plus:SI (reg/f:SI 93 
virtual-stack-vars)
                                (const_int -8 [0xfffffffffffffff8])) [1 a+0 S4 
A64]))
                    (zero_extend:DI (mem/c:SI (plus:SI (reg/f:SI 93 
virtual-stack-vars)
                                (const_int -16 [0xfffffffffffffff0])) [1 b+0 S4 
A64]))))
            (clobber (reg:CC 17 flags))
        ]) "gcc.c-torture/compile/di.c":5:12 -1
     (nil))
during RTL pass: vregs
gcc.c-torture/compile/di.c:6:1: internal compiler error: in extract_insn, at 
recog.cc:2791

In my experiments, I've used nonimmediate_operand instead of general_operand,
as a zero_extend of an immediate_operand, like const_int, would be 
non-canonical.

In short, it's ok (common) for '%' to apply to operands with different 
predicates;
reload will only swap things if the operand's predicates/constraints remain 
consistent.
For example, see i386.c's *add<mode>_1 pattern.  And as shown above it can't
be left to (until) reload to decide which "mem" gets loaded into a register 
(which
would be nice), as some passes before reload check both predicates and 
constraints.

My original patch fixes PR 110511, using the same peephole2 idiom as already
used elsewhere in i386.md.  Ok for mainline?

> -----Original Message-----
> From: Uros Bizjak <ubiz...@gmail.com>
> Sent: 19 October 2023 18:02
> To: Roger Sayle <ro...@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86 PATCH] PR target/110511: Fix reg allocation for widening
> multiplications.
> 
> On Tue, Oct 17, 2023 at 9:05 PM Roger Sayle <ro...@nextmovesoftware.com>
> wrote:
> >
> >
> > This patch contains clean-ups of the widening multiplication patterns
> > in i386.md, and provides variants of the existing highpart
> > multiplication
> > peephole2 transformations (that tidy up register allocation after
> > reload), and thereby fixes PR target/110511, which is a superfluous
> > move instruction.
> >
> > For the new test case, compiled on x86_64 with -O2.
> >
> > Before:
> > mulx64: movabsq $-7046029254386353131, %rcx
> >         movq    %rcx, %rax
> >         mulq    %rdi
> >         xorq    %rdx, %rax
> >         ret
> >
> > After:
> > mulx64: movabsq $-7046029254386353131, %rax
> >         mulq    %rdi
> >         xorq    %rdx, %rax
> >         ret
> >
> > The clean-ups are (i) that operand 1 is consistently made
> > register_operand and operand 2 becomes nonimmediate_operand, so that
> > predicates match the constraints, (ii) the representation of the BMI2
> > mulx instruction is updated to use the new umul_highpart RTX, and
> > (iii) because operands
> > 0 and 1 have different modes in widening multiplications, "a" is a
> > more appropriate constraint than "0" (which avoids spills/reloads
> > containing SUBREGs).  The new peephole2 transformations are based upon
> > those at around line 9951 of i386.md, that begins with the comment ;;
> > Highpart multiplication peephole2s to tweak register allocation.
> > ;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx  ->  mov imm,%rax; imulq
> > %rdi
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-10-17  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR target/110511
> >         * config/i386/i386.md (<u>mul<mode><dwi>3): Make operands 1 and
> >         2 take "regiser_operand" and "nonimmediate_operand" respectively.
> >         (<u>mulqihi3): Likewise.
> >         (*bmi2_umul<mode><dwi>3_1): Operand 2 needs to be register_operand
> >         matching the %d constraint.  Use umul_highpart RTX to represent
> >         the highpart multiplication.
> >         (*umul<mode><dwi>3_1):  Operand 2 should use regiser_operand
> >         predicate, and "a" rather than "0" as operands 0 and 2 have
> >         different modes.
> >         (define_split): For mul to mulx conversion, use the new
> >         umul_highpart RTX representation.
> >         (*mul<mode><dwi>3_1):  Operand 1 should be register_operand
> >         and the constraint %a as operands 0 and 1 have different modes.
> >         (*<u>mulqihi3_1): Operand 1 should be register_operand matching
> >         the constraint %0.
> >         (define_peephole2): Providing widening multiplication variants
> >         of the peephole2s that tweak highpart multiplication register
> >         allocation.
> >
> > gcc/testsuite/ChangeLog
> >         PR target/110511
> >         * gcc.target/i386/pr110511.c: New test case.
> >
> 
>  (define_insn "*bmi2_umul<mode><dwi>3_1"
>    [(set (match_operand:DWIH 0 "register_operand" "=r")
>      (mult:DWIH
> -      (match_operand:DWIH 2 "nonimmediate_operand" "%d")
> +      (match_operand:DWIH 2 "register_operand" "%d")
>        (match_operand:DWIH 3 "nonimmediate_operand" "rm")))
>     (set (match_operand:DWIH 1 "register_operand" "=r")
> 
> This will fail. Because of %, both predicates must allow nonimmediate_operand,
> since RA can swap operand constraints due to %.
> 
> @@ -9747,7 +9743,7 @@
>    [(set (match_operand:<DWI> 0 "register_operand" "=r,A")
>      (mult:<DWI>
>        (zero_extend:<DWI>
> -        (match_operand:DWIH 1 "nonimmediate_operand" "%d,0"))
> +        (match_operand:DWIH 1 "register_operand" "%d,a"))
>        (zero_extend:<DWI>
>          (match_operand:DWIH 2 "nonimmediate_operand" "rm,rm"))))
>     (clobber (reg:CC FLAGS_REG))]
> 
> The same here, although changing "0" to "a" is correct, but the oversight is
> benign. "A" reads as "ad", and the first alternative already takes "d".
> 
> +;; Widening multiplication peephole2s to tweak register allocation.
> +;; mov imm,%rdx; mov %rdi,%rax; mulq %rdx  ->  mov imm,%rax; mulq %rdi
> 
> Maybe instead of peephole2s, we allow general_operands in the instruction 
> (both
> inputs) and rely on the RA to fulfill the constraint? Would this work?
> 
> Uros.

Reply via email to