Issue 87721
Summary "Interference" assection in SplitKit - bisected to a SCEV change and isolated to AMDGPU division expansion
Labels new issue
Assignees
Reporter krzysz00
    ## The issue
LLC crashes as follows on an input attached below
```
llc: /home/kdrewnia/llvm-project/llvm/lib/CodeGen/SplitKit.cpp:1662: void llvm::SplitEditor::splitLiveThroughBlock(unsigned int, unsigned int, SlotIndex, unsigned int, SlotIndex): Assertion `(!LeaveBefore || Idx <= LeaveBefore) && "Interference"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -mattr=+sramecc,-xnack ./reproducer.ll -o -
1.      Running pass 'CallGraph Pass Manager' on module './reproducer.ll'.
2.      Running pass 'Greedy Register Allocator' on function '@rock_gemm'
 [...abort...]
#13 0x00000000034492f7 llvm::SplitEditor::splitLiveThroughBlock(unsigned int, unsigned int, llvm::SlotIndex, unsigned int, llvm::SlotIndex) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/SplitKit.cpp:1668:5
#14 0x00000000033a1630 llvm::RAGreedy::splitAroundRegion(llvm::LiveRangeEdit&, llvm::ArrayRef<unsigned int>) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:0:11
#15 0x00000000033a263d llvm::RAGreedy::doRegionSplit(llvm::LiveInterval const&, unsigned int, bool, llvm::SmallVectorImpl<llvm::Register>&) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:0:3
#16 0x00000000033a1eff llvm::RAGreedy::tryRegionSplit(llvm::LiveInterval const&, llvm::AllocationOrder&, llvm::SmallVectorImpl<llvm::Register>&) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:1093:1
#17 0x00000000033a6b01 llvm::RAGreedy::trySplit(llvm::LiveInterval const&, llvm::AllocationOrder&, llvm::SmallVectorImpl<llvm::Register>&, llvm::SmallSet<llvm::Register, 16u, std::less<llvm::Register>> const&) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:1827:26
#18 0x00000000033a8ce5 llvm::RAGreedy::selectOrSplitImpl(llvm::LiveInterval const&, llvm::SmallVectorImpl<llvm::Register>&, llvm::SmallSet<llvm::Register, 16u, std::less<llvm::Register>>&, llvm::SmallVector<std::pair<llvm::LiveInterval const*, llvm::MCRegister>, 8u>&, unsigned int) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:2476:24
#19 0x00000000033a9337 llvm::RAGreedy::selectOrSplit(llvm::LiveInterval const&, llvm::SmallVectorImpl<llvm::Register>&) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:2151:7
#20 0x000000000337bd85 llvm::RegAllocBase::allocatePhysRegs() /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocBase.cpp:114:9
#21 0x00000000033ad3cd llvm::RAGreedy::runOnMachineFunction(llvm::MachineFunction&) /home/kdrewnia/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp:2772:3
[...]
```

A `git bisect` run isolated this crash to only happening after #74467 .

While full reproduction information and variant inputs/settings that do or don't cause the crash to occur are provided below, I can report that the flag `-amdgpu-codegenprepare-disable-idiv-expansion=true` removes the failure.

## Reproduction files
All of these files are `opt -O3 -mtriple=amdgcn-amd-amdhsa` output.

I apologize in advance for the lack of a smaller test case, as `bugpoint` didn't have much luck with this one.

[reproducer.ll.txt](https://github.com/llvm/llvm-project/files/14877595/reproducer.ll.txt) is the input that triggers the crash. It is a matrix multiplication implementation.

[fewer-batches-passing.ll.txt](https://github.com/llvm/llvm-project/files/14877637/fewer-batches-passing.ll.txt) is that same code but with a lower batch size specified. That is, the input IR was identical to the failing case, but the statically-known (and annotated as a `!range`) number of workgroups differed between these two files. 

In relevant part, the diff between those two files is
```
--- reproducer.ll       2024-04-04 21:13:02.778679418 +0000
+++ fewer-batches-passing.ll    2024-04-04 21:14:50.335567529 +0000
@@ -5,29 +5,28 @@ target datalayout = "e-p:64:64-p1:64:64- @__wg_rock_gemm_0 = internal unnamed_addr addrspace(3) global [8192 x i8] undef, align 64 @__wg_rock_gemm_1 = internal unnamed_addr addrspace(3) global [8192 x i8] undef, align 64

-define amdgpu_kernel void @rock_gemm(ptr inreg noalias nocapture nofree noundef nonnull readonly align 16 dereferenceable(805306368) %0, ptr inreg noalias nocapture nofree noundef nonnull readonly align 16 dereferenceable(100663296) %1, ptr inreg noalias nocapture nofree noundef nonnull writeonly align 16 dereferenceable(301989888) %2) local_unnamed_addr #0 !reqd_work_group_size !0 {
+define amdgpu_kernel void @rock_gemm(ptr inreg noalias nocapture nofree noundef nonnull readonly align 16 dereferenceable(125829120) %0, ptr inreg noalias nocapture nofree noundef nonnull readonly align 16 dereferenceable(15728640) %1, ptr inreg noalias nocapture nofree noundef nonnull writeonly align 16 dereferenceable(47185920) %2) local_unnamed_addr #0 !reqd_work_group_size !0 {
 .preheader21.preheader:
   %3 = tail call i32 @llvm.amdgcn.workgroup.id.x(), !range !1
   %.fr = freeze i32 %3
-  %.lhs.trunc = trunc i32 %.fr to i16
-  %4 = udiv i16 %.lhs.trunc, 24
-  %5 = mul i16 %4, 24
-  %.decomposed = sub i16 %.lhs.trunc, %5
-  %.zext17 = zext nneg i16 %.decomposed to i32
-  %.cmp = icmp ugt i16 %.decomposed, 21
+  %.lhs.trunc = trunc i32 %.fr to i8
+  %4 = udiv i8 %.lhs.trunc, 24
+  %5 = mul i8 %4, 24
+  %.decomposed = sub i8 %.lhs.trunc, %5
+  %.zext17 = zext nneg i8 %.decomposed to i32
+  %.cmp = icmp ugt i8 %.decomposed, 21
   %6 = select i1 %.cmp, i32 11, i32 0
 %7 = sub nuw nsw i32 12, %6
   %8 = tail call i32 @llvm.umin.i32(i32 %7, i32 11)
-  %.lhs.trunc18 = trunc i16 %.decomposed to i8
   %.rhs.trunc = trunc i32 %8 to i8
-  %9 = urem i8 %.lhs.trunc18, %.rhs.trunc
+  %9 = urem i8 %.decomposed, %.rhs.trunc
@@ -1633,7 +1632,7 @@ attributes #4 = { convergent mustprogres
 attributes #5 = { nounwind }

 !0 = !{i32 256, i32 1, i32 1}
-!1 = !{i32 0, i32 1536}
+!1 = !{i32 0, i32 240}
 !2 = !{i32 0, i32 256}
 !3 = !{}
 !4 = !{!5}
```

[reproducer-barriers-removed.ll.txt](https://github.com/llvm/llvm-project/files/14877651/reproducer-barriers-removed.ll.txt) is `reproducer.ll` with the `call void asm` statements removed. This variant also does not crash.

## Steps to reproduce
(The `-mattr` inputs are kept to mach the original source of the bug)

```
llc -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -mattr=+sramecc,-xnack ./reproducer.ll
```

This will crash as seen above.

However, 
```
llc -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -mattr=+sramecc,-xnack ./reproducer.ll -amdgpu-codegenprepare-disable-idiv-expansion=true
```
will not crash

Similarly, replacing `reproducer.ll` with either of the two variant files will not trigger the bug.

(Finally, adding `-global-isel` will also avoid the crash)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to