| Issue |
173419
|
| Summary |
LoopUnroll epilogue change causes regalloc degradation
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
jdenny-ornl
|
[@valerydmit previously reported](https://github.com/llvm/llvm-project/pull/156549#issuecomment-3475229746) that PR #156549 (landed as 6d44b9082e42b918a152098ec70ed409c4da8c79) causes performance degradation due to changes it induces in LLVM's register allocator (regalloc). Here, I have attached [f-before-after.zip](https://github.com/user-attachments/files/24318692/f-before-after.zip), which includes:
- `f.ll`: The reproducer that @valerydmit posted there.
- `f-opt-O3-before.ll`: The result of running `opt -O3 --preserve-ll-uselistorder -S f.ll` at b36e762cdb2e90e29f65c7abffc00541addfed3f, which is the parent of the above commit.
- `f-opt-O3-after.ll`: The result of running that again but at the above commit.
Here is the result of `diff -u f-opt-O3-before.ll f-opt-O3-after.ll` except that, for readability, I have manually eliminated superficial differences (i.e., label rename, predecessor comment changes, and `@llvm.assume` declaration):
```
@@ -81,19 +81,21 @@
%14 = sub i32 %13, %9
%xtraiter = and i32 %14, 1
%15 = icmp eq i32 %10, %9
- br i1 %15, label %omp_collapsed.exit.loopexit.unr-lcssa, label %omp_collapsed.body.lr.ph.new
+ br i1 %15, label %omp_collapsed.body.epil, label %omp_collapsed.body.lr.ph.new
omp_collapsed.body.lr.ph.new: ; preds = %omp_collapsed.body.lr.ph
%unroll_iter = and i32 %14, -2
br label %omp_collapsed.body
omp_collapsed.exit.loopexit.unr-lcssa:
- %omp_collapsed.iv3.unr = phi i32 [ 0, %omp_collapsed.body.lr.ph ], [ %omp_collapsed.next.1, %omp_collapsed.body ]
%lcmp.mod.not = icmp eq i32 %xtraiter, 0
br i1 %lcmp.mod.not, label %omp_collapsed.exit, label %omp_collapsed.body.epil
omp_collapsed.body.epil:
- %16 = add i32 %omp_collapsed.iv3.unr, %9
+ %omp_collapsed.iv3.epil.init = phi i32 [ 0, %omp_collapsed.body.lr.ph ], [ %omp_collapsed.next.1, %omp_collapsed.exit.loopexit.unr-lcssa ]
+ %lcmp.mod4 = icmp ne i32 %xtraiter, 0
+ call void @llvm.assume(i1 %lcmp.mod4)
+ %16 = add i32 %omp_collapsed.iv3.epil.init, %9
%17 = urem i32 %16, %omp_loop.tripcount22
%18 = udiv i32 %16, %omp_loop.tripcount22
%19 = urem i32 %18, %omp_loop.tripcount11
```
Remarks from regalloc seem to support the original report:
```
$ llc -O3 -pass-remarks-missed=regalloc f-opt-O3-before.ll
remark: <unknown>:0:0: 5 reloads 5.000000e+01 total reloads cost 4 folded reloads 4.000000e+01 total folded reloads cost 19 virtual registers copies 1.900000e+02 total copies cost generated in loop
remark: <unknown>:0:0: 9 spills 5.750000e+00 total spills cost 13 reloads 5.437500e+01 total reloads cost 7 folded reloads 4.125000e+01 total folded reloads cost 34 virtual registers copies 2.014375e+02 total copies cost generated in function
$ llc -O3 -pass-remarks-missed=regalloc f-opt-O3-after.ll
remark: <unknown>:0:0: 10 reloads 1.000000e+02 total reloads cost 5 folded reloads 5.000000e+01 total folded reloads cost 4 virtual registers copies 4.000000e+01 total copies cost generated in loop
remark: <unknown>:0:0: 15 spills 6.062500e+00 total spills cost 21 reloads 1.034375e+02 total reloads cost 8 folded reloads 5.125000e+01 total folded reloads cost 15 virtual registers copies 4.925000e+01 total copies cost generated in function
```
It does not matter which of the above commits runs `llc`.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs