[PATCH] D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters.

Michael Liao via Phabricator via cfe-commits Wed, 18 Nov 2020 00:15:11 -0800

hliao added a comment.

As mentioned earlier, that's very experimental support. Even though the SASS 
looks reasonable, it still needs verifying on real systems. For non-kernel 
functions, it seems we share the path. So that we should do a similar thing. 
The current approach fixes that in the codegen phase by adding back the 
`alloca` to match the parameter space semantic. Once that alloca is dynamically 
indexed, it won't be promoted in SROA. Only `instcomb` eliminates that `alloca` 
when it is only modified once by copying from a constant memory. As `instcomb` 
won't break certain patterns prepared in the codegen preparation, it won't run 
in the backend. That dynamically indexed `alloca` won't be removed.




================
Comment at: clang/test/CodeGenCUDA/kernel-args.cu:13-14
 // AMDGCN: define amdgpu_kernel void @_Z6kernel1A(%struct.A addrspace(4)* 
byref(%struct.A) align 8 %{{.+}})
-// NVPTX: define void @_Z6kernel1A(%struct.A* byval(%struct.A) align 8 %x)
+// NVPTX: define void @_Z6kernel1A(%struct.A addrspace(101)* byref(%struct.A) 
align 8 %0)
 __global__ void kernel(A x) {
 }
----------------
tra wrote:
> Is the idea here to rely on PTX to store the value in param space (so we do 
> actually pass the parameter by value)  and represent it on IR level as a 
> reference to an an externally-provided storage with the value.
> So:
> - C++ passes argument by value
> - IR knows that PTX will store it somewhere in param space and uses `byref`
> - we still generate PTX which has parameter passed by value, but now we can 
> access it directly via a reference to param-space value.
> 
> Presumably for parameters we do want to modify, we'll need to fall back to 
> having a local copy.
> 
> So far so good. However, now we may have a problem distinguishing between 
> C++-level arguments passed by value vs by reference -- they all will look 
> like `byref` on IR level. That is, unless you rely on `addrspace(101)` to 
> indicate that it's actually a `byval` in disguise. 
> 
> It looks plausible as long as we can guarantee that we never modify it. 
> Neither in the current function nor in any of the callees, if we pass it by 
> reference. 
> 
> I'm not particularly familiar with AA machinery. I'd appreciate if you could 
> elaborate on how you see it all work end-to-end.
> 
It does the same thing as `nvptx-lower-args` does but applies that earlier in 
the frontend. The upside is that IR is optimized by all the middle-end opts. 
`instcomb` will remove that dynamically indexed `alloca` if it's only modified 
by copying from constant memory. AA teaches the compiler that parameter space 
has the property of constantness. Even though we run SROA after 
`nvptx-lower-args`, but we general won't run `instcomb` in the backend as it 
potentially breaks certain patterns prepared in the codegen preparation phase.

`byref` (newly added) in LLVM IR is different from by-reference in C++. The 
later is translated into a pointer. `byref` in LLVM IR says that content of 
that pointer should not be modified in the function body. It won't be ambiguous 
from the IR side.

It's still possible for the backend to do similar stuff. Once that `byval` 
argument has `readonly`, that `alloca` could be skipped.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91590/new/

https://reviews.llvm.org/D91590

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters.

Reply via email to