[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

Krzysztof Drewniak via cfe-commits Fri, 02 May 2025 14:32:27 -0700

krzysz00 wrote:

Re discussion on the other PR about "why is this even an intrinsic" - since 
this probably shouldn't just be in @jayfoad's DMs:


The reason I disagree with "just pattern-match it" is that you can't get the 
scheduling you want without a guarantee of the intrinssic
 
Namely, while 
```
global_load_b32 v1, v0
ds_write_addtid_b32 v1, s0
```
is obviously
```
s_mov_b32 m0, s0
global_load_lds_b32 v0
```
if we turn that first example into
``` 
pipelined_loop: {
  global_load_b32 v2, v0
  ...
  waitcnt(lds only) + barrier
  ds_read v*, ...
  mfmas(v)
  waitcnt(lds)+s_barrier
  waitcnt(vmem) ;; and not substantially earlier please
  ds_write_addtid_b32 v2, s0
  jle pipelined_loop
}
```
for example, we really don't want that match firing because LDS gets overridden.
 
... *unless* we're double-buffering into LDS and so trying to do
```
pipelined_lds: {
  waitcnt(vmem,lds)+barrier
  load_lds(global1(iv), lds2)
  do_compute(lds1)
  waitcnt(vmem,lds)+barrier
  load_lds(global2(iv), lds1)
  do_compute(lds2) ;; We'd better not be waiting on LDS1 to settle at/before 
here
  iv += 2
}
```
where, if the pattern match for the addtid load fails, say by waitcnt 
insertion, that'll cause proglems for the program
 
Not to mention, because we don't have an intrinsic for ds_addtid, and because 
there are a *lot* of ways to spell the lane ID (mbcnt, workitem.id.x with 
annotations, a bunch of workitem IDs mod 64, etc etc), that'll be quite fragile
 
So in the context of GEMM stuff, I'd rather not have this at "hope the compiler 
recognizes what we're trying to do". If the compiler can be made to recognize 
what we're trying to do reliably in the future, that'll be cool, but I can't be 
the one to write that patch and I don't think there's infinite bandwidth among 
the AMDGPU crowd for this improvement either

https://github.com/llvm/llvm-project/pull/137425
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

Reply via email to