llvmorg-github-actions[bot] wrote:

<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-flang-openmp

Author: Julian Brown (jtb20)

<details>
<summary>Changes</summary>

The OpenMP 6.0 'saved' firstprivate modifier (see [7.2] together
with [14.3]) requires each list item to be snapshotted once at
recording time and observable on every replay of the recorded
task.  libomp reuses the same task descriptor across every replay
of a taskgraph-owned task, so the '.kmp_privates.t' tail struct
that holds the firstprivate values is also the natural home for
the saved data environment.  Getting that right needs two
changes, which this patch lands together: the destructor of each
list item must fire exactly once at end-of-taskgraph (not after
every replay), and non-trivially-copyable list items must be
re-constructed per replay so that copy constructors and inner
self-references are respected.

On the runtime side, move the per-task destructor-thunk
invocation from __kmp_task_finish (which previously fired it at
the end of every replay, leaving the saved snapshot in a
destructed state for the next replay) to __kmp_taskgraph_free,
so it fires exactly once per task at end-of-taskgraph.  Skip
taskwait nodes (record_map entries with task == nullptr) in that
loop while we are there, to avoid a latent nullptr dereference
that the existing tests do not exercise.

On the compiler side, the runtime previously cloned each replay's
task descriptor with a bitwise memcpy in __kmp_taskgraph_clone_task,
and a FIXME noted that this silently corrupts firstprivate list items
whose type is not trivially copyable (self-referential structs, types
with user-defined copy constructors / destructors, types holding inner
pointers into themselves).  Emit a per-task clone helper

  void __omp_task_clone.NN(kmp_task_t *dst, kmp_task_t *src,
                           int lastpriv);

modelled on emitTaskDupFunction and reusing emitPrivatesInit (now
extended with a tri-state PrivatesInitMode of Normal / ForDup /
ForClone), which re-runs the copy constructor of each
firstprivate list item into the freshly allocated descriptor's
'.kmp_privates.t'.  Tasks whose firstprivates are all trivially
copyable still rely on the runtime's memcpy fast-path and emit no
clone helper.  emitTaskCall passes the helper to the runtime as
the new trailing argument of __kmpc_taskgraph_task (null when no
helper is needed).

Two previously-XFAIL'd taskgraph runtime tests
(taskgraph_replayable_saved_stack_depth.cpp and
taskgraph_shared_stack_depth.cpp) now pass and are un-XFAIL'd, and other
tests have been added to cover new functionality.

Assisted-By: Claude Opus 4.7


---

Patch is 39.87 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/200408.diff


12 Files Affected:

- (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+111-6) 
- (modified) clang/lib/CodeGen/CGOpenMPRuntime.h (+4) 
- (modified) clang/test/OpenMP/taskgraph_firstprivate_saved_ast_print.cpp (+67) 
- (added) clang/test/OpenMP/taskgraph_task_clone_codegen.cpp (+51) 
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+1-1) 
- (modified) openmp/runtime/src/kmp.h (+2-1) 
- (modified) openmp/runtime/src/kmp_tasking.cpp (+50-7) 
- (added) 
openmp/runtime/test/taskgraph/taskgraph_firstprivate_saved_nontrivial.cpp 
(+146) 
- (added) 
openmp/runtime/test/taskgraph/taskgraph_firstprivate_saved_nontrivial_selfref.cpp
 (+97) 
- (added) openmp/runtime/test/taskgraph/taskgraph_firstprivate_saved_static.cpp 
(+100) 
- (modified) 
openmp/runtime/test/taskgraph/taskgraph_replayable_saved_stack_depth.cpp (-1) 
- (modified) openmp/runtime/test/taskgraph/taskgraph_shared_stack_depth.cpp 
(+15-1) 


``````````diff
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index ee8583a9f5519..a48a6e9790975 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -3679,14 +3679,34 @@ emitTaskPrivateMappingFunction(CodeGenModule &CGM, 
SourceLocation Loc,
 }
 
 /// Emit initialization for private variables in task-based directives.
+/// Selects where \c emitPrivatesInit should read the initial value of each
+/// non-trivial firstprivate copy from.
+enum class PrivatesInitMode {
+  /// Initialize using the captured original lvalues in the caller IR (i.e.
+  /// at task-allocation time).
+  Normal,
+  /// Reading from the source task's \c shareds region. Used by the taskloop
+  /// task-dup function to seed sibling tasks.
+  ForDup,
+  /// Reading from the source task's \c .kmp_privates.t region (the field at
+  /// the same index as the destination). Used by the taskgraph clone
+  /// function to seed the persistent clone from the original task's
+  /// already-initialized snapshot. Works uniformly for captured and
+  /// static-storage firstprivates because no capture lookup is needed.
+  ForClone,
+};
+
 static void emitPrivatesInit(CodeGenFunction &CGF,
                              const OMPExecutableDirective &D,
                              Address KmpTaskSharedsPtr, LValue TDBase,
                              const RecordDecl *KmpTaskTWithPrivatesQTyRD,
                              QualType SharedsTy, QualType SharedsPtrTy,
                              const OMPTaskDataTy &Data,
-                             ArrayRef<PrivateDataTy> Privates, bool ForDup) {
+                             ArrayRef<PrivateDataTy> Privates,
+                             PrivatesInitMode Mode, LValue SrcPrivatesBase) {
   ASTContext &C = CGF.getContext();
+  const bool ForDup = Mode == PrivatesInitMode::ForDup;
+  const bool ForClone = Mode == PrivatesInitMode::ForClone;
   auto FI = std::next(KmpTaskTWithPrivatesQTyRD->field_begin());
   LValue PrivatesBase = CGF.EmitLValueForField(TDBase, *FI);
   OpenMPDirectiveKind Kind = isOpenMPTaskLoopDirective(D.getDirectiveKind())
@@ -3718,8 +3738,11 @@ static void emitPrivatesInit(CodeGenFunction &CGF,
     }
     const VarDecl *VD = Pair.second.PrivateCopy;
     const Expr *Init = VD->getAnyInitializer();
-    if (Init && (!ForDup || (isa<CXXConstructExpr>(Init) &&
-                             !CGF.isTrivialInitializer(Init)))) {
+    // ForDup and ForClone only re-initialize non-trivial firstprivates; the
+    // surrounding runtime memcpy is sufficient for trivially-copyable ones.
+    const bool NonTrivialOnly = ForDup || ForClone;
+    if (Init && (!NonTrivialOnly || (isa<CXXConstructExpr>(Init) &&
+                                     !CGF.isTrivialInitializer(Init)))) {
       LValue PrivateLValue = CGF.EmitLValueForField(PrivatesBase, *FI);
       if (const VarDecl *Elem = Pair.second.PrivateElemInit) {
         const VarDecl *OriginalVD = Pair.second.Original;
@@ -3739,6 +3762,9 @@ static void emitPrivatesInit(CodeGenFunction &CGF,
                  "Expected artificial target data variable.");
           SharedRefLValue =
               CGF.MakeAddrLValue(CGF.GetAddrOfLocalVar(OriginalVD), Type);
+        } else if (ForClone) {
+          // Source is the same field on the origin task's privates record.
+          SharedRefLValue = CGF.EmitLValueForField(SrcPrivatesBase, *FI);
         } else if (ForDup) {
           SharedRefLValue = CGF.EmitLValueForField(SrcBase, SharedField);
           SharedRefLValue = CGF.MakeAddrLValue(
@@ -3889,11 +3915,76 @@ emitTaskDupFunction(CodeGenModule &CGM, SourceLocation 
Loc,
         CGF.Int8Ty, CGM.getNaturalTypeAlignment(SharedsTy));
   }
   emitPrivatesInit(CGF, D, KmpTaskSharedsPtr, TDBase, 
KmpTaskTWithPrivatesQTyRD,
-                   SharedsTy, SharedsPtrTy, Data, Privates, /*ForDup=*/true);
+                   SharedsTy, SharedsPtrTy, Data, Privates,
+                   PrivatesInitMode::ForDup, /*SrcPrivatesBase=*/LValue());
   CGF.FinishFunction();
   return TaskDup;
 }
 
+/// Emit task_clone function (for re-initializing non-trivially-copyable
+/// firstprivate copies when cloning a task into a taskgraph record).
+/// \code
+/// void __omp_task_clone(kmp_task_t *task_dst, const kmp_task_t *task_src,
+///                       int /*unused*/) {
+///   // copy-construct each non-trivial firstprivate from
+///   // task_src->.kmp_privates.t into task_dst->.kmp_privates.t.
+/// }
+/// \endcode
+/// The (unused) third parameter is present so that the function shares the
+/// same calling convention as the existing taskloop \c task_dup callback
+/// (\c p_task_dup_t in the runtime), letting the runtime invoke either via
+/// a single function-pointer type.
+static llvm::Value *
+emitTaskCloneFunction(CodeGenModule &CGM, SourceLocation Loc,
+                      const OMPExecutableDirective &D,
+                      QualType KmpTaskTWithPrivatesPtrQTy,
+                      const RecordDecl *KmpTaskTWithPrivatesQTyRD,
+                      QualType SharedsTy, QualType SharedsPtrTy,
+                      const OMPTaskDataTy &Data,
+                      ArrayRef<PrivateDataTy> Privates) {
+  ASTContext &C = CGM.getContext();
+  auto *DstArg = ImplicitParamDecl::Create(
+      C, /*DC=*/nullptr, Loc, /*Id=*/nullptr, KmpTaskTWithPrivatesPtrQTy,
+      ImplicitParamKind::Other);
+  auto *SrcArg = ImplicitParamDecl::Create(
+      C, /*DC=*/nullptr, Loc, /*Id=*/nullptr, KmpTaskTWithPrivatesPtrQTy,
+      ImplicitParamKind::Other);
+  auto *UnusedArg =
+      ImplicitParamDecl::Create(C, /*DC=*/nullptr, Loc, /*Id=*/nullptr, 
C.IntTy,
+                                ImplicitParamKind::Other);
+  FunctionArgList Args{DstArg, SrcArg, UnusedArg};
+  const auto &FnInfo =
+      CGM.getTypes().arrangeBuiltinFunctionDeclaration(C.VoidTy, Args);
+  llvm::FunctionType *FnTy = CGM.getTypes().GetFunctionType(FnInfo);
+  std::string Name = CGM.getOpenMPRuntime().getName({"omp_task_clone", ""});
+  auto *Fn = llvm::Function::Create(FnTy, llvm::GlobalValue::InternalLinkage,
+                                    Name, &CGM.getModule());
+  CGM.SetInternalFunctionAttributes(GlobalDecl(), Fn, FnInfo);
+  if (!CGM.getCodeGenOpts().SampleProfileFile.empty())
+    Fn->addFnAttr("sample-profile-suffix-elision-policy", "selected");
+  Fn->setDoesNotRecurse();
+  CodeGenFunction CGF(CGM);
+  CGF.StartFunction(GlobalDecl(), C.VoidTy, Fn, FnInfo, Args, Loc, Loc);
+
+  LValue DstBase = CGF.EmitLoadOfPointerLValue(
+      CGF.GetAddrOfLocalVar(DstArg),
+      KmpTaskTWithPrivatesPtrQTy->castAs<PointerType>());
+  LValue SrcBase = CGF.EmitLoadOfPointerLValue(
+      CGF.GetAddrOfLocalVar(SrcArg),
+      KmpTaskTWithPrivatesPtrQTy->castAs<PointerType>());
+  // Address the .kmp_privates.t sub-record of the source task; the
+  // destination's privates record is located via DstBase inside
+  // emitPrivatesInit.
+  auto PrivatesFI = std::next(KmpTaskTWithPrivatesQTyRD->field_begin());
+  LValue SrcPrivatesBase = CGF.EmitLValueForField(SrcBase, *PrivatesFI);
+
+  emitPrivatesInit(CGF, D, /*KmpTaskSharedsPtr=*/Address::invalid(), DstBase,
+                   KmpTaskTWithPrivatesQTyRD, SharedsTy, SharedsPtrTy, Data,
+                   Privates, PrivatesInitMode::ForClone, SrcPrivatesBase);
+  CGF.FinishFunction();
+  return Fn;
+}
+
 /// Checks if destructor function is required to be generated.
 /// \return true if cleanups are required, false otherwise.
 static bool
@@ -4395,7 +4486,7 @@ CGOpenMPRuntime::TaskResultTy 
CGOpenMPRuntime::emitTaskInit(
   if (!Privates.empty()) {
     emitPrivatesInit(CGF, D, KmpTaskSharedsPtr, Base, 
KmpTaskTWithPrivatesQTyRD,
                      SharedsTy, SharedsPtrTy, Data, Privates,
-                     /*ForDup=*/false);
+                     PrivatesInitMode::Normal, /*SrcPrivatesBase=*/LValue());
     if (isOpenMPTaskLoopDirective(D.getDirectiveKind()) &&
         (!Data.LastprivateVars.empty() || checkInitIsRequired(CGF, Privates))) 
{
       Result.TaskDupFn = emitTaskDupFunction(
@@ -4403,6 +4494,16 @@ CGOpenMPRuntime::TaskResultTy 
CGOpenMPRuntime::emitTaskInit(
           KmpTaskTQTyRD, SharedsTy, SharedsPtrTy, Data, Privates,
           /*WithLastIter=*/!Data.LastprivateVars.empty());
     }
+    // For plain tasks (not taskloops) that have at least one non-trivially
+    // copyable firstprivate, emit a clone function so that the runtime can
+    // re-initialize those fields when the task is recorded into a taskgraph.
+    // Taskloops already cover the same need via their TaskDupFn.
+    if (!isOpenMPTaskLoopDirective(D.getDirectiveKind()) &&
+        checkInitIsRequired(CGF, Privates)) {
+      Result.TaskCloneFn = emitTaskCloneFunction(
+          CGM, Loc, D, KmpTaskTWithPrivatesPtrQTy, KmpTaskTWithPrivatesQTyRD,
+          SharedsTy, SharedsPtrTy, Data, Privates);
+    }
   }
   // Fields of union "kmp_cmplrdata_t" for destructors and priority.
   enum { Priority = 0, Destructors = 1 };
@@ -4950,7 +5051,7 @@ void CGOpenMPRuntime::emitTaskCall(
                                                   PrePostActionTy &) {
     llvm::Value *ThreadId = getThreadID(CGF, Loc);
     llvm::Value *UpLoc = emitUpdateLocation(CGF, Loc);
-    std::array<llvm::Value *, 9> TGTaskArgs;
+    std::array<llvm::Value *, 10> TGTaskArgs;
     std::array<llvm::Value *, 3> TaskAllocArgs;
     TaskResultTy Result = emitTaskInit(CGF, Loc, D, TaskFunction, SharedsTy,
                                        Shareds, Data, true, TaskAllocArgs);
@@ -4985,6 +5086,10 @@ void CGOpenMPRuntime::emitTaskCall(
                         ? CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
                               RelocFn, CGM.VoidPtrTy)
                         : llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
+    TGTaskArgs[9] = Result.TaskCloneFn
+                        ? CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
+                              Result.TaskCloneFn, CGF.VoidPtrTy)
+                        : llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
     CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
                             CGM.getModule(), OMPRTL___kmpc_taskgraph_task),
                         TGTaskArgs);
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h 
b/clang/lib/CodeGen/CGOpenMPRuntime.h
index d4dbbef5745a5..15fd273bd8936 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -557,6 +557,10 @@ class CGOpenMPRuntime {
     LValue TDBase;
     const RecordDecl *KmpTaskTQTyRD = nullptr;
     llvm::Value *TaskDupFn = nullptr;
+    /// Compiler-emitted helper that re-initializes any non-trivially-copyable
+    /// firstprivate fields after the runtime bitwise-clones a task into a
+    /// taskgraph record. Null when no such helper is required.
+    llvm::Value *TaskCloneFn = nullptr;
   };
   /// Emit task region for the task directive. The task region is emitted in
   /// several steps:
diff --git a/clang/test/OpenMP/taskgraph_firstprivate_saved_ast_print.cpp 
b/clang/test/OpenMP/taskgraph_firstprivate_saved_ast_print.cpp
index df358df14c82f..23cfbdddb3f68 100644
--- a/clang/test/OpenMP/taskgraph_firstprivate_saved_ast_print.cpp
+++ b/clang/test/OpenMP/taskgraph_firstprivate_saved_ast_print.cpp
@@ -76,4 +76,71 @@ void firstprivate_saved() {
   { (void)a; }
 }
 
+// Per OpenMP 6.0 [14.3], a 'firstprivate' clause with the 'saved' modifier on
+// a replayable construct may include variables with static storage duration;
+// they are copied into the saved data environment of the taskgraph record.
+// This covers file-scope statics, static-local variables, static data
+// members, and const-qualified statics, all of which Sema accepts and Clang
+// codegen places into the per-task '.kmp_privates.t' tail struct.
+
+static int FileScopeStatic = 100;
+static const int FileScopeConstStatic = 200;
+
+struct WithStaticMember {
+  static int StaticMember;
+  static const int StaticConstMember = 400;
+};
+int WithStaticMember::StaticMember = 0;
+
+void firstprivate_saved_statics() {
+  static int LocalStatic = 300;
+  static const int LocalConstStatic = 500;
+
+  // CHECK-LABEL: void firstprivate_saved_statics
+  // CHECK: #pragma omp task firstprivate(saved: FileScopeStatic)
+  #pragma omp task firstprivate(saved: FileScopeStatic)
+  { (void)FileScopeStatic; }
+
+  // CHECK: #pragma omp task firstprivate(saved: FileScopeConstStatic)
+  #pragma omp task firstprivate(saved: FileScopeConstStatic)
+  { (void)FileScopeConstStatic; }
+
+  // CHECK: #pragma omp task firstprivate(saved: LocalStatic)
+  #pragma omp task firstprivate(saved: LocalStatic)
+  { (void)LocalStatic; }
+
+  // CHECK: #pragma omp task firstprivate(saved: LocalConstStatic)
+  #pragma omp task firstprivate(saved: LocalConstStatic)
+  { (void)LocalConstStatic; }
+
+  // CHECK: #pragma omp task firstprivate(saved: 
WithStaticMember::StaticMember)
+  #pragma omp task firstprivate(saved: WithStaticMember::StaticMember)
+  { (void)WithStaticMember::StaticMember; }
+
+  // CHECK: #pragma omp task firstprivate(saved: 
WithStaticMember::StaticConstMember)
+  #pragma omp task firstprivate(saved: WithStaticMember::StaticConstMember)
+  { (void)WithStaticMember::StaticConstMember; }
+
+  // Multiple statics in a single clause, mixed with a non-static.
+  int local_int = 0;
+  // CHECK: #pragma omp task firstprivate(saved: 
FileScopeStatic,LocalStatic,WithStaticMember::StaticMember,local_int)
+  #pragma omp task firstprivate(saved:                                         
\
+                                FileScopeStatic, LocalStatic,                  
\
+                                WithStaticMember::StaticMember, local_int)
+  {
+    (void)FileScopeStatic;
+    (void)LocalStatic;
+    (void)WithStaticMember::StaticMember;
+    (void)local_int;
+  }
+
+  // Same on a 'taskloop' construct.
+  // CHECK: #pragma omp taskloop firstprivate(saved: 
FileScopeStatic,LocalConstStatic)
+  #pragma omp taskloop firstprivate(saved: FileScopeStatic, LocalConstStatic)
+  for (int i = 0; i < 4; ++i) {
+    (void)FileScopeStatic;
+    (void)LocalConstStatic;
+  }
+}
+
 #endif
diff --git a/clang/test/OpenMP/taskgraph_task_clone_codegen.cpp 
b/clang/test/OpenMP/taskgraph_task_clone_codegen.cpp
new file mode 100644
index 0000000000000..9451ad3ba9110
--- /dev/null
+++ b/clang/test/OpenMP/taskgraph_task_clone_codegen.cpp
@@ -0,0 +1,51 @@
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -x c++ -triple 
x86_64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | 
FileCheck %s
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+// Verifies that for a 'firstprivate' on a task inside a taskgraph whose
+// type has a non-trivial copy constructor, the compiler emits a dedicated
+// '.omp_task_clone.' helper and passes it to __kmpc_taskgraph_task in
+// the trailing argument slot.  The helper re-runs the copy constructor
+// from the origin task's '.kmp_privates.t' field into the clone's, so
+// that the runtime memcpy does not produce a torn copy of a non-
+// trivially-copyable object.
+
+struct NonTrivial {
+  int v;
+  int *self;
+  NonTrivial();
+  NonTrivial(const NonTrivial &other);
+  ~NonTrivial();
+};
+
+void run() {
+  NonTrivial nt;
+#pragma omp taskgraph
+  {
+#pragma omp task firstprivate(nt)
+    {
+      (void)nt.v;
+    }
+  }
+}
+
+// The clone helper is passed as the trailing pointer argument to
+// __kmpc_taskgraph_task (10 total: ident, gtid, task, flags, sizes...,
+// ndeps, deps, relocation, clone).
+// CHECK: call i32 @__kmpc_taskgraph_task(ptr {{[^,]+}}, i32 {{[^,]+}}, ptr 
{{[^,]+}}, i32 {{[^,]+}}, i64 {{[^,]+}}, i64 {{[^,]+}}, i32 {{[^,]+}}, ptr 
{{[^,]+}}, ptr {{[^,]+}}, ptr @.omp_task_clone.)
+
+// The clone helper has the same calling convention as the existing
+// taskloop task-dup callback so that the runtime can dispatch through
+// a single function-pointer type; the third parameter is unused here.
+// The body indexes the same .kmp_privates.t field on both source and
+// destination tasks and invokes NonTrivial's copy constructor.
+// CHECK: define internal void @.omp_task_clone.(ptr noundef %{{[^,]+}}, ptr 
noundef %{{[^,]+}}, i32 noundef %{{[^,]+}})
+// CHECK: getelementptr inbounds {{.*}} %struct.kmp_task_t_with_privates,
+// CHECK: getelementptr inbounds {{.*}} %struct.kmp_task_t_with_privates,
+// CHECK: getelementptr inbounds {{.*}} %struct..kmp_privates.t,
+// CHECK: getelementptr inbounds {{.*}} %struct..kmp_privates.t,
+// CHECK: call void @_ZN10NonTrivialC1ERKS_(
+
+#endif
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def 
b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
index 02e3e1f98e969..85a2eb6f35f22 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -360,7 +360,7 @@ __OMP_RTL(__kmpc_taskgroup, false, Void, IdentPtr, Int32)
 __OMP_RTL(__kmpc_taskgraph, false, Void, IdentPtr, Int32, VoidPtrPtr, SizeTy,
           Int32, Int32, VoidPtr, VoidPtr)
 __OMP_RTL(__kmpc_taskgraph_task, false, Int32, IdentPtr, Int32, VoidPtr, Int32,
-          SizeTy, SizeTy, Int32, VoidPtr, VoidPtr)
+          SizeTy, SizeTy, Int32, VoidPtr, VoidPtr, VoidPtr)
 __OMP_RTL(__kmpc_taskgraph_taskloop, false, Int32, IdentPtr, Int32, VoidPtr,
           Int32, Int32, Int64Ptr, Int64Ptr, Int64,
           Int32, Int32, Int64, Int32, VoidPtr, VoidPtr)
diff --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index 9ebb7e6f654bc..56e99ef30380e 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -4511,7 +4511,8 @@ KMP_EXPORT void __kmpc_taskgraph(ident_t *loc_ref, 
kmp_int32 gtid,
 KMP_EXPORT kmp_uint32 __kmpc_taskgraph_task(
     ident_t *loc_ref, kmp_int32 gtid, kmp_task_t *new_task, kmp_int32 flags,
     size_t sizeof_kmp_task_t, size_t sizeof_shareds,
-    kmp_int32 ndeps, kmp_depend_info_t *dep_list, kmp_task_relocate_t reloc);
+    kmp_int32 ndeps, kmp_depend_info_t *dep_list, kmp_task_relocate_t reloc,
+    void *task_clone);
 KMP_EXPORT kmp_uint32 __kmpc_taskgraph_taskloop(
     ident_t *loc_ref, kmp_int32 gtid, kmp_task_t *new_task, kmp_int32 flags,
     kmp_int32 if_val, kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st,
diff --git a/openmp/runtime/src/kmp_tasking.cpp 
b/openmp/runtime/src/kmp_tasking.cpp
index 7b3f4b04fbd16..8609c643e112b 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -838,8 +838,22 @@ static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t 
*task,
      placed here, since at this point other tasks might have been released
      hence overlapping the destructor invocations with some other work in the
      released tasks.  The OpenMP spec is not specific on when the destructors
-     are invoked, so we should be free to choose. */
-  if (UNLIKELY(taskdata->td_flags.destructors_thunk)) {
+     are invoked, so we should be free to choose.
+
+     For tasks owned by a taskgraph record, the same task descriptor (and
+     therefore the same .kmp_privates.t storage) is reused for every replay.
+     Firing the per-task destructor here would destruct the per-task
+     'firstprivate' copies (e.g. the snapshot used to realise the OpenMP 6.0
+     'firstprivate(saved: ...)' modifier) on the first replay completion,
+     leaving subsequent replays observing destructed state.  Defer the
+     destructor invocation to __kmp_taskgraph_free, which fires it exactly
+     once per task at end-of-taskgraph. */
+  bool defer_destructors_to_taskgraph_free = false;
+#if OMP_TASKGRAPH_EXPERIMENTAL
+  defer_destructors_to_taskgraph_free = is_taskgraph;
+#endif
+  if (UNLIKELY(taskdata->td_flags.destructors_thunk) &&
+      !defer_destructors_to_taskgraph_free) {
     kmp_routine_entry_t destr_thunk = task->data1.destructors;
     KMP_ASSERT(destr_thunk);
     destr_thunk(gtid, task);
@@ -5823,7 +5837,22 @@ static void __kmp_taskgraph_free(kmp_int32 gtid, 
kmp_taskgraph_record_t *rec,
   __kmp_taskgraph_free_region_metadata(thread, rec->root);
 
   for (size_t task = 0; task < rec->num_tasks; task++) {
-    kmp_taskdata *taskdata = KMP_TASK_TO_TASKDATA(rec->record_map[task].task);
+    // Skip entries that don't have an associated task (e.g. taskwait nodes
+    // recorded by __kmpc_taskgraph_taskwait).
+    if (rec->record_map[task].task == nullptr)
+      continue;
+    kmp_task_t *taskptr = rec->record_map[task].task;
+    kmp_taskdata *taskdata = KMP_TASK_TO_TASKDATA(taskptr);
+    // Fire the per-task...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/200408
_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to