================
@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 |
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 |
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc
\
+// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib
-nogpuinc \
+// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -###
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"],
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]",
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]",
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]",
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs:
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]",
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs:
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs:
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"],
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]",
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs:
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3
-Xarch_sm_60 -O0 \
+// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60"
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda
\
+// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"
----------------
Artem-B wrote:
You're probably the one holding most of the puzzle pieces here. For CUDA/HIP
things (mostly) work well enough. Even though there are somewhat useful
attempts to target spir-V, they are half-baked at best. For OpenMP the target
set is widely open, so I can see how it would need a more flexible way to
customize the compilation pipeline construction.
In abstract, part of the problem boils down to naming. You need to be able to
tell the driver, what we want to target (currently we use --offload-arch= and
-offload options for that) and how to select subsets of the constructed
pipeline (-X<host|device|arch|something else>).
`--offload-arch=amdgcnspirv,gfx1030` looks like a reasonable approach to me,
and the same naming scheme could be used as a selector part of `-Xarch`. What
we have now may not be perfect, but I think it's reasonably functional. Is
there a particular reason `--offload-arch=amdgcnspirv,gfx1030` is not
sufficient to drive pipeline construction? I'm not sure why we want -Xarch to
do that job.
> Compared to those I really don't think -Xarch_ is a big deal, since it does
> what you want, passes those arguments only to the Triple toolchain.
My issue is that the patch was making -Xarch do the job it's not intended for
*and* hiding real errors in the process. The "passing the arguments to the
triple{-selected} toolchain" (as an wider-scope selector, compared to per-GPU
or host/device ones we have now) part is fine. Let's handle pipeline creation
challenge separately.
https://github.com/llvm/llvm-project/pull/125421
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits