[llvm-bugs] [Bug 135486] [LLVM] APInt::tcAdd has quiet poor codegen.

LLVM Bugs via llvm-bugs Sat, 12 Apr 2025 01:57:07 -0700

Issue	135486
Summary	[LLVM] APInt::tcAdd has quiet poor codegen.
Labels	new issue
Assignees
Reporter	Ralender

    Here is an example of 1 iteration of the loop
Compiled with clang trunk -O3
```asm
; start of iteration
      89: 4c 8b 14 cf                  	movq	(%rdi,%rcx,8), %r10 ; load dst[i]
 8d: 4c 8b 0c ce                  	movq	(%rsi,%rcx,8), %r9 ; load rhs[i]
 91: 48 85 d2                     	testq	%rdx, %rdx ; test if we have a carry
      94: 74 1a                        	je	0xb0 <_ZN4llvm5APInt5tcAddEPmPKmmj+0xb0>


      96: 4f 8d 4c 0a 01 	leaq	0x1(%r10,%r9), %r9 ; add with a carry
      9b: 4d 39 d1 	cmpq	%r10, %r9
      9e: 41 0f 96 c2 	setbe	%r10b ; put carry of next iteration in r10b
      a2: eb 13 	jmp	0xb7 <_ZN4llvm5APInt5tcAddEPmPKmmj+0xb7>

      b0: 4d 01 d1                     	addq	%r10, %r9 ; add without carry
      b3: 41 0f 92 c2                  	setb	%r10b ; put carry of next iteration in r10b

      b7: 4c 89 0c cf                  	movq	%r9, (%rdi,%rcx,8) ; write result back to memory
; next iteration unrolled
      bb: 4c 8b 4c cf 08               	movq	0x8(%rdi,%rcx,8), %r9
      c0: 48 8b 54 ce 08 	movq	0x8(%rsi,%rcx,8), %rdx
      c5: 45 84 d2 	testb	%r10b, %r10b
; ...
```
Just as a reference here is What I think optimal x86-64 code looks like
```
; start of iteration
; carry is in CF, 
    12a1: 4a 8b 4c ce f0               	movq	-0x10(%rsi,%r9,8), %rcx ; load rhs[i]
    12a6: 4a 11 4c cf f0               	adcq	%rcx, -0x10(%rdi,%r9,8) ; lhs[i] = lhs[i] + rhs[i] + carry
; next iteration unrolled
    12ab: 4a 8b 4c ce e8               	movq	-0x18(%rsi,%r9,8), %rcx
    12b0: 4a 11 4c cf e8               	adcq	%rcx, -0x18(%rdi,%r9,8)
;...
```

Stuff that could be done better:
 - one side of the branch detects overflow with a cmp instead of CF
 - The branch doesn't get eliminated.
 - adc is not used.

I made a benchmark to get an idea of how big the impact is.
 APInt::tcAdd       : 1728.4 ns for 4096 iterations in average
 optimized code     : 1045.6 ns for 4096 iterations in average

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 135486] [LLVM] APInt::tcAdd has quiet poor codegen.

Reply via email to