On Thu, Aug 3, 2023 at 12:18 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch is a conservative fix for PR target/110792, a wrong-code
> regression affecting doubleword rotations by BITS_PER_WORD, which
> effectively swaps the highpart and lowpart words, when the source to be
> rotated resides in memory. The issue is that if the register used to
> hold the lowpart of the destination is mentioned in the address of
> the memory operand, the current define_insn_and_split unintentionally
> clobbers it before reading the highpart.
>
> Hence, for the testcase, the incorrectly generated code looks like:
>
>         salq    $4, %rdi                // calculate address
>         movq    WHIRL_S+8(%rdi), %rdi   // accidentally clobber addr
>         movq    WHIRL_S(%rdi), %rbp     // load (wrong) lowpart
>
> Traditionally, the textbook way to fix this would be to add an
> explicit early clobber to the instruction's constraints.
>
>  (define_insn_and_split "<insn>32di2_doubleword"
> - [(set (match_operand:DI 0 "register_operand" "=r,r,r")
> + [(set (match_operand:DI 0 "register_operand" "=r,r,&r")
>         (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "0,r,o")
>                        (const_int 32)))]
>
> but unfortunately this currently generates significantly worse code,
> due to a strange choice of reloads (effectively memcpy), which ends up
> looking like:
>
>         salq    $4, %rdi                // calculate address
>         movdqa  WHIRL_S(%rdi), %xmm0    // load the double word in SSE reg.
>         movaps  %xmm0, -16(%rsp)        // store the SSE reg back to the
> stack
>         movq    -8(%rsp), %rdi          // load highpart
>         movq    -16(%rsp), %rbp         // load lowpart
>
> Note that reload's "&" doesn't distinguish between the memory being
> early clobbered, vs the registers used in an addressing mode being
> early clobbered.
>
> The fix proposed in this patch is to remove the third alternative, that
> allowed offsetable memory as an operand, forcing reload to place the
> operand into a register before the rotation.  This results in:
>
>         salq    $4, %rdi
>         movq    WHIRL_S(%rdi), %rax
>         movq    WHIRL_S+8(%rdi), %rdi
>         movq    %rax, %rbp
>
> I believe there's a more advanced solution, by swapping the order of
> the loads (if first destination register is mentioned in the address),
> or inserting a lea insn (if both destination registers are mentioned
> in the address), but this fix is a minimal "safe" solution, that
> should hopefully be suitable for backporting.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-08-02  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/110792
>         * config/i386/i386.md (<any_rotate>ti3): For rotations by 64 bits
>         place operand in a register before gen_<insn>64ti2_doubleword.
>         (<any_rotate>di3): Likewise, for rotations by 32 bits, place
>         operand in a register before gen_<insn>32di2_doubleword.
>         (<any_rotate>32di2_doubleword): Constrain operand to be in register.
>         (<any_rotate>64ti2_doubleword): Likewise.
>
> gcc/testsuite/ChangeLog
>         PR target/110792
>         * g++.target/i386/pr110792.C: New 32-bit C++ test case.
>         * gcc.target/i386/pr110792.c: New 64-bit C test case.

OK.

Thanks,
Uros.
>
>
> Thanks in advance,
> Roger
> --
>

Reply via email to