Issue 56063
Summary [CUDA] Performance regression in CUDA Clang for the RSBench mini-app
Labels cuda, performance
Assignees
Reporter jhuber6
    The [RSBench](https//github.com/ANL-CESAR/RSBench.git) mini-application experienced a performance regression when targeting CUDA on my V100 with CUDA 11.6.2. Previously, Clang's performance was roughly on-par with NVCC's with an execution time of about 2.1 seconds on my machine. Following the application of 0af3e6a22da2eda5021b5fad656d0b9db7702e0a the performance has regressed roughly 33% to about 3.1 seconds. Reverting this commit locally gets back the original performance and matches NVCC. This was produced using the following commands. I can provide the IR differences later.

```
$ cd cuda/
$ clang++  --offload-arch=sm_70 -O3 -c main.cu -o main.o
$ clang++  --offload-arch=sm_70 -O3 -c simulation.cu -o simulation.o
$ clang++  --offload-arch=sm_70 -O3 -c io.cu -o io.o
$ clang++  --offload-arch=sm_70 -O3 -c init.cu -o init.o
$ clang++  --offload-arch=sm_70 -O3 -c material.cu -o material.o
$ clang++  --offload-arch=sm_70 -O3 -c utils.cu -o utils.o
$ clang++  --offload-arch=sm_70 -O3 main.o simulation.o io.o init.o material.o utils.o -o rsbench -lm -lcudart
$ nvprof ./rsbench -m event
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to