https://github.com/nicebert created https://github.com/llvm/llvm-project/pull/205325
This series adds no-loop SPMD kernel promotion for OpenMP target offload, with the groundwork it depends on. - target-fast implies teams/threads oversubscription: Enable -fopenmp-assume-teams-oversubscription and -fopenmp-assume-threads-oversubscription by default under -fopenmp-target-fast. This is groundwork for the no-loop support, which depends on these assumptions being in effect. Adds driver test coverage for both. - no-loop SPMD kernel promotion: Promote target teams distribute parallel for kernels to no-loop SPMD mode when teams and threads oversubscription are assumed and no num_teams or reduction clause is present, mirroring Flang's MLIR promotion. Adds a C offload test alongside the existing Fortran one. >From b4820be40e259e1c4db4ff3bcb3580ecf6591191 Mon Sep 17 00:00:00 2001 From: Nicole Aschenbrenner <[email protected]> Date: Tue, 9 Jun 2026 10:06:13 -0500 Subject: [PATCH 1/2] [OpenMP] target-fast implies teams/threads oversubscription Enable -fopenmp-assume-teams-oversubscription and -fopenmp-assume-threads-oversubscription by default under -fopenmp-target-fast. This is groundwork for the upcoming no-loop support, which depends on these assumptions being in effect. Adds driver test coverage for both. --- clang/lib/Driver/ToolChains/Clang.cpp | 4 ++-- clang/test/Driver/openmp-target-fast-flag.c | 22 ++++++++++++++------- 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 8a0efd70e6c0d..81f364a27f6d8 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -6946,11 +6946,11 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, // thread and team counts in the device. if (Args.hasFlag(options::OPT_fopenmp_assume_teams_oversubscription, options::OPT_fno_openmp_assume_teams_oversubscription, - /*Default=*/false)) + /*Default=*/TargetFastUsed)) CmdArgs.push_back("-fopenmp-assume-teams-oversubscription"); if (Args.hasFlag(options::OPT_fopenmp_assume_threads_oversubscription, options::OPT_fno_openmp_assume_threads_oversubscription, - /*Default=*/false)) + /*Default=*/TargetFastUsed)) CmdArgs.push_back("-fopenmp-assume-threads-oversubscription"); // Handle -fopenmp-assume-no-thread-state (implied by target-fast) diff --git a/clang/test/Driver/openmp-target-fast-flag.c b/clang/test/Driver/openmp-target-fast-flag.c index 0390790b3f533..d6b0c262f8431 100644 --- a/clang/test/Driver/openmp-target-fast-flag.c +++ b/clang/test/Driver/openmp-target-fast-flag.c @@ -1,25 +1,25 @@ // REQUIRES: x86-registered-target, amdgpu-registered-target // RUN: %clang -### -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a -nogpulib %s -O0 2>&1 \ -// RUN: | FileCheck -check-prefixes=DefaultTFast,DefaultTState,DefaultNoNestParallel %s +// RUN: | FileCheck -check-prefixes=DefaultTFast,DefaultTState,DefaultNoNestParallel,DefaultTeamsOver,DefaultThreadsOver %s // RUN: %clang -### -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a -nogpulib -O0 -fopenmp-target-fast %s 2>&1 \ -// RUN: | FileCheck -check-prefixes=TState,NestParallel %s +// RUN: | FileCheck -check-prefixes=TState,NestParallel,TeamsOver,ThreadsOver %s // RUN: %clang -### -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a -nogpulib -O3 %s 2>&1 \ -// RUN: | FileCheck -check-prefixes=O3,DefaultTFast,DefaultTState,DefaultNoNestParallel %s +// RUN: | FileCheck -check-prefixes=O3,DefaultTFast,DefaultTState,DefaultNoNestParallel,DefaultTeamsOver,DefaultThreadsOver %s // RUN: %clang -### -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a -nogpulib -O3 -fno-openmp-target-fast %s 2>&1 \ -// RUN: | FileCheck -check-prefixes=O3,DefaultTState,DefaultNoNestParallel %s +// RUN: | FileCheck -check-prefixes=O3,DefaultTState,DefaultNoNestParallel,DefaultTeamsOver,DefaultThreadsOver %s // RUN: %clang -### -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a -nogpulib -Ofast %s 2>&1 \ -// RUN: | FileCheck -check-prefixes=OFast,TState,NestParallel %s +// RUN: | FileCheck -check-prefixes=OFast,TState,NestParallel,TeamsOver,ThreadsOver %s // RUN: %clang -### -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a -nogpulib -Ofast -fno-openmp-target-fast %s 2>&1 \ -// RUN: | FileCheck -check-prefixes=OFast,DefaultTState,DefaultNoNestParallel %s +// RUN: | FileCheck -check-prefixes=OFast,DefaultTState,DefaultNoNestParallel,DefaultTeamsOver,DefaultThreadsOver %s // RUN: %clang -### -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a -nogpulib -O0 -fno-openmp-target-fast -fopenmp-target-fast %s 2>&1 \ -// RUN: | FileCheck -check-prefixes=TState,NestParallel %s +// RUN: | FileCheck -check-prefixes=TState,NestParallel,TeamsOver,ThreadsOver %s // O3: -O3 // OFast: -Ofast @@ -33,3 +33,11 @@ // NestParallel: "-fopenmp-assume-no-nested-parallelism" // NestParallel-NOT: "-fno-openmp-assume-no-nested-parallelism" // DefaultNoNestParallel-NOT: {{"-f(-no-)?openmp-assume-no-nested-parallelism"}} + +// TeamsOver: "-fopenmp-assume-teams-oversubscription" +// TeamsOver-NOT: "-fno-openmp-assume-teams-oversubscription" +// DefaultTeamsOver-NOT: {{"-f(no-)?openmp-assume-teams-oversubscription"}} + +// ThreadsOver: "-fopenmp-assume-threads-oversubscription" +// ThreadsOver-NOT: "-fno-openmp-assume-threads-oversubscription" +// DefaultThreadsOver-NOT: {{"-f(no-)?openmp-assume-threads-oversubscription"}} >From 2d8afd0603f1ac4a2d80c0b680f1f3e0039aa568 Mon Sep 17 00:00:00 2001 From: Nicole Aschenbrenner <[email protected]> Date: Fri, 19 Jun 2026 05:50:00 -0500 Subject: [PATCH 2/2] [OpenMP][Clang] Add no-loop SPMD kernel promotion Promote target teams distribute parallel for kernels to no-loop SPMD mode when teams and threads oversubscription are assumed and no num_teams or reduction clause is present, mirroring Flang's MLIR promotion. Adds a C offload test alongside the existing Fortran one. --- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp | 23 +++++- offload/test/offloading/target-no-loop.c | 91 ++++++++++++++++++++++++ 2 files changed, 111 insertions(+), 3 deletions(-) create mode 100644 offload/test/offloading/target-no-loop.c diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index cb0e7297f1a89..77e1d96eb2c41 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -701,6 +701,18 @@ static bool supportsSPMDExecutionMode(ASTContext &Ctx, "Unknown programming model for OpenMP directive on NVPTX target."); } +/// Check whether a target kernel can be promoted to a "no-loop" SPMD kernel, +/// mirroring Flang's MLIR promotion path. +static bool canPromoteToNoLoop(const LangOptions &LangOpts, + const OMPExecutableDirective &D) { + OpenMPDirectiveKind DKind = D.getDirectiveKind(); + return (DKind == OMPD_target_teams_distribute_parallel_for || + DKind == OMPD_target_teams_distribute_parallel_for_simd) && + LangOpts.OpenMPTeamSubscription && LangOpts.OpenMPThreadSubscription && + !D.hasClausesOfKind<OMPNumTeamsClause>() && + !D.hasClausesOfKind<OMPReductionClause>(); +} + void CGOpenMPRuntimeGPU::emitNonSPMDKernel(const OMPExecutableDirective &D, StringRef ParentName, llvm::Function *&OutlinedFn, @@ -746,9 +758,14 @@ void CGOpenMPRuntimeGPU::emitKernelInit(const OMPExecutableDirective &D, CodeGenFunction &CGF, EntryFunctionState &EST, bool IsSPMD) { llvm::OpenMPIRBuilder::TargetKernelDefaultAttrs Attrs; - Attrs.ExecFlags = - IsSPMD ? llvm::omp::OMPTgtExecModeFlags::OMP_TGT_EXEC_MODE_SPMD - : llvm::omp::OMPTgtExecModeFlags::OMP_TGT_EXEC_MODE_GENERIC; + if (IsSPMD && canPromoteToNoLoop(CGM.getLangOpts(), D)) + Attrs.ExecFlags = + llvm::omp::OMPTgtExecModeFlags::OMP_TGT_EXEC_MODE_SPMD_NO_LOOP; + else + Attrs.ExecFlags = + IsSPMD ? llvm::omp::OMPTgtExecModeFlags::OMP_TGT_EXEC_MODE_SPMD + : llvm::omp::OMPTgtExecModeFlags::OMP_TGT_EXEC_MODE_GENERIC; + computeMinAndMaxThreadsAndTeams(D, CGF, Attrs); CGBuilderTy &Bld = CGF.Builder; diff --git a/offload/test/offloading/target-no-loop.c b/offload/test/offloading/target-no-loop.c new file mode 100644 index 0000000000000..74ef18f60067c --- /dev/null +++ b/offload/test/offloading/target-no-loop.c @@ -0,0 +1,91 @@ +// C counterpart of fortran/target-no-loop.f90. + +// RUN: %libomptarget-compile-generic -O3 -fopenmp-assume-threads-oversubscription -fopenmp-assume-teams-oversubscription +// RUN: env LIBOMPTARGET_INFO=16 OMP_NUM_TEAMS=16 OMP_TEAMS_THREAD_LIMIT=16 %libomptarget-run-generic 2>&1 | %fcheck-generic +// REQUIRES: gpu +// XFAIL: intelgpu + +#include <stdio.h> + +static int check_errors(int *array) { + int errors = 0; + for (int i = 0; i < 1024; ++i) + if (array[i] != i + 1) + ++errors; + return errors; +} + +int main(void) { + int array[1024]; + int errors = 0; + int red; + + for (int i = 0; i < 1024; ++i) + array[i] = 1; + + // No-loop kernel +#pragma omp target teams distribute parallel for + for (int i = 0; i < 1024; ++i) + array[i] = i + 1; + errors += check_errors(array); + + // SPMD kernel (num_teams clause blocks promotion to no-loop) + for (int i = 0; i < 1024; ++i) + array[i] = 1; +#pragma omp target teams distribute parallel for num_teams(3) + for (int i = 0; i < 1024; ++i) + array[i] = i + 1; + errors += check_errors(array); + + // No-loop kernel + for (int i = 0; i < 1024; ++i) + array[i] = 1; +#pragma omp target teams distribute parallel for num_threads(64) + for (int i = 0; i < 1024; ++i) + array[i] = i + 1; + errors += check_errors(array); + + // SPMD kernel + for (int i = 0; i < 1024; ++i) + array[i] = 1; +#pragma omp target parallel for + for (int i = 0; i < 1024; ++i) + array[i] = i + 1; + errors += check_errors(array); + + // Generic kernel + for (int i = 0; i < 1024; ++i) + array[i] = 1; +#pragma omp target teams distribute + for (int i = 0; i < 1024; ++i) + array[i] = i + 1; + errors += check_errors(array); + + // SPMD kernel (reduction clause blocks promotion to no-loop) + for (int i = 0; i < 1024; ++i) + array[i] = 1; + red = 0; +#pragma omp target teams distribute parallel for reduction(+ : red) + for (int i = 0; i < 1024; ++i) + red += array[i]; + if (red != 1024) + ++errors; + + printf("number of errors: %d\n", errors); + + return 0; +} + +// CHECK: PluginInterface device {{[0-9]+}} info: Launching kernel {{.*}} SPMD-No-Loop mode +// CHECK: info: #Args: 2 Teams x Thrds: 64x 16 +// CHECK: PluginInterface device {{[0-9]+}} info: Launching kernel {{.*}} SPMD mode +// CHECK: info: #Args: 2 Teams x Thrds: 3x 16 {{.*}} +// CHECK: PluginInterface device {{[0-9]+}} info: Launching kernel {{.*}} SPMD-No-Loop mode +// CHECK: info: #Args: 2 Teams x Thrds: 64x 16 {{.*}} +// CHECK: PluginInterface device {{[0-9]+}} info: Launching kernel {{.*}} SPMD mode +// CHECK: info: #Args: 2 Teams x Thrds: 1x 16 +// CHECK: PluginInterface device {{[0-9]+}} info: Launching kernel {{.*}} Generic-SPMD mode +// CHECK: info: #Args: 2 Teams x Thrds: 16x 16 {{.*}} +// CHECK: PluginInterface device {{[0-9]+}} info: Launching kernel {{.*}} SPMD mode +// CHECK: info: #Args: 3 Teams x Thrds: 16x 16 {{.*}} +// CHECK: number of errors: 0 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
