llvmbot wrote:

<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-clangir

Author: Srivarshitha M (16srivarshitha)

<details>
<summary>Changes</summary>

Related: #<!-- -->175871, #<!-- -->179278

When `-fgpu-default-stream=per-thread` is specified, CUDA and HIP kernels 
should be launched using the per-thread stream variants of the launch API 
instead of the default `cudaLaunchKernel`/`hipLaunchKernel`.

This PR implements that by selecting the correct launch function name in 
`emitDeviceStubBodyNew`:
For CUDA: `cudaLaunchKernel_ptsz`
For HIP: `hipLaunchKernel_spt`

This matches the behavior of the OG CodeGen implementation in `CGCUDANV.cpp` 
and resolves the `errorNYI("CUDA/HIP Stream per thread")` that was previously 
hit when this stream mode was requested.

The existing kernel launch infrastructure (the `__cudaPushCallConfiguration` / 
`__cudaPopCallConfiguration` flow and device stub calls) was already upstream - 
this PR completes the kernel launch calls section of the CUDA/HIP tracking 
issue by adding the missing stream-per-thread support.

Tested locally with FileCheck for CUDA-NEW, HIP-NEW, CUDA-PTH, HIP-PTH, and 
DEVICE check prefixes.

---
Full diff: https://github.com/llvm/llvm-project/pull/188004.diff


2 Files Affected:

- (modified) clang/lib/CIR/CodeGen/CIRGenCUDANV.cpp (+7-3) 
- (modified) clang/test/CIR/CodeGenCUDA/kernel-call.cu (+12) 


``````````diff
diff --git a/clang/lib/CIR/CodeGen/CIRGenCUDANV.cpp 
b/clang/lib/CIR/CodeGen/CIRGenCUDANV.cpp
index 8b8e99023eceb..1bae5e470aadd 100644
--- a/clang/lib/CIR/CodeGen/CIRGenCUDANV.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenCUDANV.cpp
@@ -150,10 +150,14 @@ void 
CIRGenNVCUDARuntime::emitDeviceStubBodyNew(CIRGenFunction &cgf,
 
   // The default stream is usually stream 0 (the legacy default stream).
   // For per-thread default stream, we need a different LaunchKernel function.
-  StringRef kernelLaunchAPI = "LaunchKernel";
+  std::string kernelLaunchAPI = "LaunchKernel";
   if (cgm.getLangOpts().GPUDefaultStream ==
-      LangOptions::GPUDefaultStreamKind::PerThread)
-    cgm.errorNYI("CUDA/HIP Stream per thread");
+      LangOptions::GPUDefaultStreamKind::PerThread) {
+    if (cgm.getLangOpts().HIP)
+      kernelLaunchAPI += "_spt";
+    else if (cgm.getLangOpts().CUDA)
+      kernelLaunchAPI += "_ptsz";
+  }
 
   std::string launchKernelName = addPrefixToName(kernelLaunchAPI);
   const IdentifierInfo &launchII =
diff --git a/clang/test/CIR/CodeGenCUDA/kernel-call.cu 
b/clang/test/CIR/CodeGenCUDA/kernel-call.cu
index 2d37b6eef73af..230bcdfe6e22c 100644
--- a/clang/test/CIR/CodeGenCUDA/kernel-call.cu
+++ b/clang/test/CIR/CodeGenCUDA/kernel-call.cu
@@ -14,6 +14,14 @@
 // RUN:   -emit-cir %s -x cuda -fcuda-is-device -o %t.device.cir
 // RUN: FileCheck --input-file=%t.device.cir %s --check-prefix=DEVICE
 
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-sdk-version=9.2 \
+// RUN:   -fgpu-default-stream=per-thread -DCUDA_API_PER_THREAD_DEFAULT_STREAM 
\
+// RUN:   -emit-cir %s -x cuda -o - | FileCheck %s --check-prefix=CUDA-PTH
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fhip-new-launch-api \
+// RUN:   -fgpu-default-stream=per-thread -DHIP_API_PER_THREAD_DEFAULT_STREAM \
+// RUN:   -emit-cir %s -x hip -o - | FileCheck %s --check-prefix=HIP-PTH
+
 #include "Inputs/cuda.h"
 
 
@@ -55,6 +63,8 @@
 // Check cudaLaunchKernel is called with all 6 arguments:
 // func ptr, gridDim, blockDim, args, sharedMem, stream
 // CUDA-NEW: cir.call @cudaLaunchKernel({{.*}}) : (!cir.ptr<!void>{{.*}}, 
!rec_dim3, !rec_dim3, !cir.ptr<!cir.ptr<!void>>{{.*}}, !u64i{{.*}}, 
!cir.ptr<!rec_cudaStream>{{.*}}) -> (!u32i {llvm.noundef})
+// CUDA-PTH: cir.call @cudaLaunchKernel_ptsz
+
 //
 // HIP-NEW: cir.global constant external @_Z6kernelif = 
#cir.global_view<@_Z21__device_stub__kernelif> : !cir.func<(!s32i, !cir.float)>
 // HIP-NEW-LABEL: cir.func {{.*}} @_Z21__device_stub__kernelif
@@ -62,6 +72,8 @@
 // HIP-NEW: cir.call @__hipPopCallConfiguration({{.*}}) : 
(!cir.ptr<!rec_dim3>, !cir.ptr<!rec_dim3>, !cir.ptr<!u64i>, 
!cir.ptr<!cir.ptr<!rec_hipStream>>) -> !s32i
 // HIP-NEW: cir.get_global @_Z6kernelif : !cir.ptr<!cir.func<(!s32i, 
!cir.float)>>
 // HIP-NEW: cir.call @hipLaunchKernel({{.*}}) : (!cir.ptr<!void> {{.*}}, 
!rec_dim3, !rec_dim3, !cir.ptr<!cir.ptr<!void>>{{.*}}, !u64i{{.*}}, 
!cir.ptr<!rec_hipStream>{{.*}}) -> (!u32i {llvm.noundef})
+// HIP-PTH: cir.call @hipLaunchKernel_spt
+
 __global__ void kernel(int x, float y) {}
 
 // ===----------------------------------------------------------------------===

``````````

</details>


https://github.com/llvm/llvm-project/pull/188004
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to