https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125585

            Bug ID: 125585
           Summary: [x86-64] `(__int128)a*b + c` carry chain: redundant
                    high-word zeroing
           Product: gcc
           Version: 16.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: chfast at gmail dot com
  Target Milestone: ---

When a 64-bit carry is folded into an `__int128` product, gcc builds the
128-bit addend `{c,0}` (an extra `xor` + `adc reg,reg`) instead of carry-only
`adc reg,$0`, and shuffles the carry word with extra `mov`s.


    using u64 = unsigned long;
    using u128 = unsigned __int128;
    void mul(u64 t[3], const u64 x[2], u64 y) {
        u128 p = (u128)x[0] * y;
        t[0] = (u64)p;
        u64 c = (u64)(p >> 64);
        p = (u128)x[1] * y + c;
        t[1] = (u64)p;
        t[2] = (u64)(p >> 64);
    }

GCC 16 -O3 emits:

        mov     rcx, rdx
        mov     rax, rdx
        mov     r9, rsi
        mov     r8, rdi
        mul     QWORD PTR [rsi]
        mov     QWORD PTR [rdi], rax
        mov     rax, rdx
        xor     edx, edx
        mov     rsi, rax
        mov     rdi, rdx
        mov     rax, rcx
        mul     QWORD PTR [r9+8]
        add     rax, rsi
        adc     rdx, rdi
        mov     QWORD PTR [r8+8], rax
        mov     QWORD PTR [r8+16], rdx
        ret

Should emit (as clang):

        mov     rcx, rdx
        mov     rax, rdx
        mul     qword ptr [rsi]
        mov     r8, rdx
        mov     qword ptr [rdi], rax
        mov     rax, rcx
        mul     qword ptr [rsi + 8]
        add     rax, r8
        adc     rdx, 0
        mov     qword ptr [rdi + 8], rax
        mov     qword ptr [rdi + 16], rdx
        ret

https://godbolt.org/z/eY6affEc1

Reply via email to