https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125585
Bug ID: 125585
Summary: [x86-64] `(__int128)a*b + c` carry chain: redundant
high-word zeroing
Product: gcc
Version: 16.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: chfast at gmail dot com
Target Milestone: ---
When a 64-bit carry is folded into an `__int128` product, gcc builds the
128-bit addend `{c,0}` (an extra `xor` + `adc reg,reg`) instead of carry-only
`adc reg,$0`, and shuffles the carry word with extra `mov`s.
using u64 = unsigned long;
using u128 = unsigned __int128;
void mul(u64 t[3], const u64 x[2], u64 y) {
u128 p = (u128)x[0] * y;
t[0] = (u64)p;
u64 c = (u64)(p >> 64);
p = (u128)x[1] * y + c;
t[1] = (u64)p;
t[2] = (u64)(p >> 64);
}
GCC 16 -O3 emits:
mov rcx, rdx
mov rax, rdx
mov r9, rsi
mov r8, rdi
mul QWORD PTR [rsi]
mov QWORD PTR [rdi], rax
mov rax, rdx
xor edx, edx
mov rsi, rax
mov rdi, rdx
mov rax, rcx
mul QWORD PTR [r9+8]
add rax, rsi
adc rdx, rdi
mov QWORD PTR [r8+8], rax
mov QWORD PTR [r8+16], rdx
ret
Should emit (as clang):
mov rcx, rdx
mov rax, rdx
mul qword ptr [rsi]
mov r8, rdx
mov qword ptr [rdi], rax
mov rax, rcx
mul qword ptr [rsi + 8]
add rax, r8
adc rdx, 0
mov qword ptr [rdi + 8], rax
mov qword ptr [rdi + 16], rdx
ret
https://godbolt.org/z/eY6affEc1