Hi Uros,
Thanks for the speedy review.

> From: Uros Bizjak <ubiz...@gmail.com>
> Sent: 17 October 2023 17:38
> 
> On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle <ro...@nextmovesoftware.com>
> wrote:
> >
> >
> > This patch is the backend piece of a solution to PRs 101955 and
> > 106245, that adds a define_insn_and_split to the i386 backend, to
> > perform sign extension of a single (least significant) bit using AND $1 
> > then NEG.
> >
> > Previously, (x<<31)>>31 would be generated as
> >
> >         sall    $31, %eax       // 3 bytes
> >         sarl    $31, %eax       // 3 bytes
> >
> > with this patch the backend now generates:
> >
> >         andl    $1, %eax        // 3 bytes
> >         negl    %eax            // 2 bytes
> >
> > Not only is this smaller in size, but microbenchmarking confirms that
> > it's a performance win on both Intel and AMD; Intel sees only a 2%
> > improvement (perhaps just a size effect), but AMD sees a 7% win.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-10-17  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR middle-end/101955
> >         PR tree-optimization/106245
> >         * config/i386/i386.md (*extv<mode>_1_0): New define_insn_and_split.
> >
> > gcc/testsuite/ChangeLog
> >         PR middle-end/101955
> >         PR tree-optimization/106245
> >         * gcc.target/i386/pr106245-2.c: New test case.
> >         * gcc.target/i386/pr106245-3.c: New 32-bit test case.
> >         * gcc.target/i386/pr106245-4.c: New 64-bit test case.
> >         * gcc.target/i386/pr106245-5.c: Likewise.
> 
> +;; Split sign-extension of single least significant bit as and x,$1;neg
> +x (define_insn_and_split "*extv<mode>_1_0"
> +  [(set (match_operand:SWI48 0 "register_operand" "=r")
> + (sign_extract:SWI48 (match_operand:SWI48 1 "register_operand" "0")
> +    (const_int 1)
> +    (const_int 0)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  ""
> +  "#"
> +  "&& 1"
> 
> No need to use "&&" for an empty insn constraint. Just use "reload_completed" 
> in
> this case.
> 
> +  [(parallel [(set (match_dup 0) (and:SWI48 (match_dup 1) (const_int 1)))
> +      (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (match_dup 0) (neg:SWI48 (match_dup 0)))
> +      (clobber (reg:CC FLAGS_REG))])])
> 
> Did you intend to split this after reload? If this is the case, then 
> reload_completed
> is missing.

Because this splitter neither required the allocation of a new pseudo, nor a
hard register assignment, i.e. it's a splitter that can be run before or after
reload, it's written to split "whenever".  If you'd prefer it to only split 
after
reload, I agree a "reload_completed" can be added (alternatively, adding
"ix86_pre_reload_split ()" would also work).

I now see from "*load_tp_<mode>" that "" is perhaps preferred over "&& 1"
In these cases.  Please let me know which you prefer.

Cheers,
Roger


Reply via email to