llvmorg-github-actions[bot] wrote:

<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-clang-codegen

Author: Julian Brown (jtb20)

<details>
<summary>Changes</summary>

OpenMP 6.0 lets a taskgraph region be recorded once and replayed many
times.  Each replay creates a fresh instance of the 'args' pointer
block passed to __kmpc_taskgraph (and may execute at a different stack
location, or even on a different stack), so by-reference captures inside a
recorded task must be re-pointed at the live host objects of the current
invocation; otherwise the recorded tasks would dereference stale memory
from the stack frame of the initial call to __kmpc_taskgraph.

This patch introduces the small infrastructure to do that and wires
it up for the explicit 'task' construct.  A subsequent patch
extends the same scheme to 'taskloop'.

On the compiler side (CGOpenMPRuntime.cpp), a new helper
emitTaskRelocationFunction emits a per-task thunk:

  void __omp_taskgraph_relocate.NN(kmp_task_t *task,
                                   void *outer_captures);

The thunk walks the task's captures and overwrites each entry of
task-&gt;shareds with the address of the corresponding field projected from
the freshly reconstructed outer pointer block.  Two classes of capture do
not need updating and are treated as no-ops by the thunk: captures that
correspond to a firstprivate list item (the body reads from the per-task
'.kmp_privates.t' snapshot, populated when the task is allocated and
-- for non-trivial types -- reset on each replay by the clone helper
introduced later), and captures of variables with static storage duration
(their address is link-time fixed).  Reductions of a local-stack variable
are intentionally not in this set: the taskred state is keyed on the
recording-time taskgroup hierarchy and is not yet usable on replay,
so we prefer to preserve today's relocate-returns-null / runtime-aborts
behaviour for that case so the limitation surfaces as a diagnostic.

emitTaskCall now emits such a thunk for each taskgraph-recorded task
and passes it as the new trailing argument of __kmpc_taskgraph_task.
The redundant 'shareds' parameter is dropped, since relocation now
provides the supported mechanism for refreshing that pointer.

On the runtime side (kmp.h, kmp_tasking.cpp, OMPKinds.def),
introduce a new typedef kmp_task_relocate_t and store the callback
on each recorded task in kmp_taskgraph_node_t::relocate, together
with the outer-record pointer captured at __kmpc_taskgraph entry in
kmp_taskgraph_record_t::taskgraph_args.  __kmp_omp_tg_task invokes
the callback on replay, and aborts with a new fatal diagnostic
(OmpTaskgraphBadCapture, i18n/en_US.txt) when a recorded task has a
non-null shareds payload but no relocation callback.  There is also a
fix for a pre-existing bug in __kmp_taskgraph_clone_task -- the cloned
task's shareds pointer was left referring to the original's payload --
which becomes observable as soon as the relocation thunk writes through
that pointer.

New libomp tests cover lexical and non-lexical shared captures,
pointer captures, non-trivial types, recursive recordings,
stack-depth differences across replays, and the saved/expired-
graph cases.

Assisted-By: Claude Opus 4.7


---

Patch is 47.65 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/200404.diff


17 Files Affected:

- (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+177-8) 
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+1-1) 
- (modified) openmp/runtime/src/i18n/en_US.txt (+1) 
- (modified) openmp/runtime/src/kmp.h (+5-2) 
- (modified) openmp/runtime/src/kmp_tasking.cpp (+32-11) 
- (added) openmp/runtime/test/taskgraph/taskgraph_firstprivate_stack_depth.cpp 
(+111) 
- (added) 
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_mixed_capture.cpp
 (+44) 
- (added) 
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_nontrivial_type.cpp
 (+58) 
- (added) 
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_nontrivial_type_recursive.cpp
 (+86) 
- (added) 
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_pointer.cpp 
(+42) 
- (added) 
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_pointer_recursive_frameid.cpp
 (+75) 
- (added) 
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_recursive.cpp 
(+44) 
- (added) 
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_works.cpp 
(+41) 
- (added) 
openmp/runtime/test/taskgraph/taskgraph_replayable_nonlexical_shared_fails_1.cpp
 (+47) 
- (added) 
openmp/runtime/test/taskgraph/taskgraph_replayable_nonlexical_shared_fails_2.cpp
 (+66) 
- (added) 
openmp/runtime/test/taskgraph/taskgraph_replayable_saved_stack_depth.cpp (+115) 
- (added) openmp/runtime/test/taskgraph/taskgraph_shared_stack_depth.cpp (+93) 


``````````diff
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 9f00545cd0839..9f342038f2285 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -2241,6 +2241,169 @@ void CGOpenMPRuntime::emitTaskyieldCall(CodeGenFunction 
&CGF,
     Region->emitUntiedSwitch(CGF);
 }
 
+/// Emit a helper with the runtime relocation signature (kmp_task_relocate_t):
+///   void relocate(kmp_task_t *task, void *outer_captures);
+///
+/// On taskgraph replay the runtime invokes this helper to refresh the task's
+/// shared-pointer table. Each capture (a shared-by-ref variable or \c this)
+/// that the task body actually dereferences at execution time is
+/// re-projected from the freshly reconstructed outer record passed as
+/// \p outer_captures and stored back into \c task->shareds.
+///
+/// Captures that the body cannot observe a changed address for across
+/// replays are skipped here:
+///
+///   * captures of a variable that appears as a firstprivate list item
+///     -- the body sources the value from the per-task '.kmp_privates.t'
+///     snapshot rather than from the shareds slot, so the (potentially
+///     stale) original address in the shareds entry is harmless;
+///
+///   * captures of a variable with static (global / namespace-scope /
+///     static-local / static-data-member) storage duration -- the
+///     captured pointer is the variable's link-time-fixed address, which
+///     is identical at recording and on every replay, so no re-projection
+///     is meaningful.
+///
+/// The relocate helper is therefore only ever called upon to refresh
+/// shareds slots that the body genuinely depends on at execution time
+/// (shared-by-ref to a local variable, captured \c this on a heap or
+/// stack object, etc.).  When every capture falls in one of the
+/// skip-eligible categories the helper is emitted as a (still non-null)
+/// no-op: today's runtime only inspects null-vs-non-null, and a non-null
+/// no-op is the right signal that there is nothing the body actually
+/// needs the shareds table refreshed for.
+///
+/// Reduction captures of a local-stack variable still keep the existing
+/// null-relocate-and-abort behaviour: the taskred runtime state is keyed
+/// off the recording-time taskgroup hierarchy and is not currently usable
+/// on replay, so it is preferable to fail loudly (#302) than to silently
+/// misbehave.  Reduction captures of a static-storage variable do not run
+/// into this hazard at the relocate layer -- the captured pointer is
+/// stable -- and are no-op-skipped via the static-storage rule above;
+/// whether the reduction body itself then succeeds on replay is a
+/// separate concern.
+///
+/// Returns null only when at least one capture is genuinely shared (none
+/// of the skip-eligible categories apply) AND cannot be resolved in
+/// \p OuterCSI; in that case the caller passes a null relocation function
+/// to the runtime and the runtime fails fast at replay.
+static llvm::Function *
+emitTaskRelocationFunction(CodeGenModule &CGM, SourceLocation Loc,
+                           const CapturedStmt &CS,
+                           const CodeGenFunction::CGCapturedStmtInfo *OuterCSI,
+                           const OMPTaskDataTy &Data) {
+  ASTContext &C = CGM.getContext();
+
+  // Variables that don't need their shareds slot refreshed across replays
+  // because the body sources them from the per-task '.kmp_privates.t'
+  // snapshot.  Today this is the set of firstprivate list items (snapshot
+  // is taken at task allocation and reused unchanged by every replay).
+  llvm::SmallPtrSet<const VarDecl *, 8> NoRelocateFirstprivateVars;
+  for (const Expr *E : Data.FirstprivateVars) {
+    if (!E)
+      continue;
+    if (const auto *DRE = dyn_cast<DeclRefExpr>(E->IgnoreParenImpCasts()))
+      if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl()))
+        NoRelocateFirstprivateVars.insert(VD->getCanonicalDecl());
+  }
+
+  // A capture is "no-op-safe" with respect to taskgraph replay when
+  // refreshing its shareds slot is provably unnecessary - either because
+  // the body never reads from that slot (firstprivate) or because the
+  // captured pointer is a link-time-fixed address and is therefore
+  // identical at every replay (static storage duration).
+  auto IsNoOpRelocate = [&](const CapturedStmt::Capture &Cap) {
+    if (Cap.capturesThis() || !Cap.capturesVariable())
+      return false;
+    const VarDecl *VD = Cap.getCapturedVar();
+    if (VD->hasGlobalStorage())
+      return true;
+    return NoRelocateFirstprivateVars.contains(VD->getCanonicalDecl());
+  };
+
+  auto LookupOuterField =
+      [&](const CapturedStmt::Capture &Cap) -> const FieldDecl * {
+    if (!OuterCSI)
+      return nullptr;
+    return Cap.capturesThis() ? OuterCSI->getThisFieldDecl()
+                              : OuterCSI->lookup(Cap.getCapturedVar());
+  };
+
+  // Bail out before emitting any IR if a genuinely-shared capture cannot
+  // be resolved in the containing context.  No-op-safe captures (see the
+  // function-level comment) don't participate in this preflight; they
+  // simply cause the helper to skip their slot below.
+  if (llvm::any_of(CS.captures(), [&](const CapturedStmt::Capture &Cap) {
+        assert((Cap.capturesThis() || Cap.capturesVariable()) &&
+               "OpenMP task capture must be shared-by-ref or 'this'");
+        return !IsNoOpRelocate(Cap) && !LookupOuterField(Cap);
+      }))
+    return nullptr;
+
+  // void relocate(void *task, void *outer_captures)
+  auto *TaskArg =
+      ImplicitParamDecl::Create(C, /*DC=*/nullptr, Loc, /*Id=*/nullptr,
+                                C.VoidPtrTy, ImplicitParamKind::Other);
+  auto *OuterArg =
+      ImplicitParamDecl::Create(C, /*DC=*/nullptr, Loc, /*Id=*/nullptr,
+                                C.VoidPtrTy, ImplicitParamKind::Other);
+  FunctionArgList Args{TaskArg, OuterArg};
+  const CGFunctionInfo &FnInfo =
+      CGM.getTypes().arrangeBuiltinFunctionDeclaration(C.VoidTy, Args);
+
+  std::string Name =
+      CGM.getOpenMPRuntime().getName({"omp", "taskgraph", "relocate", ""});
+  auto *Fn = llvm::Function::Create(CGM.getTypes().GetFunctionType(FnInfo),
+                                    llvm::GlobalValue::InternalLinkage, Name,
+                                    &CGM.getModule());
+  CGM.SetInternalFunctionAttributes(GlobalDecl(), Fn, FnInfo);
+  if (!CGM.getCodeGenOpts().SampleProfileFile.empty())
+    Fn->addFnAttr("sample-profile-suffix-elision-policy", "selected");
+  Fn->setDoesNotRecurse();
+
+  CodeGenFunction CGF(CGM);
+  CGF.StartFunction(GlobalDecl(), C.VoidTy, Fn, FnInfo, Args, Loc, Loc);
+
+  CGBuilderTy &Bld = CGF.Builder;
+  CharUnits PtrAlign = CGF.getPointerAlign();
+
+  // Base of the reconstructed outer record for this replay.
+  llvm::Value *OuterRaw = Bld.CreateLoad(CGF.GetAddrOfLocalVar(OuterArg));
+
+  // kmp_task_t::shareds is the first field of the runtime task descriptor;
+  // load it to obtain the void* shared table that we will refresh in place.
+  // The table holds one void* per by-ref capture.
+  llvm::Value *TaskRaw = Bld.CreateLoad(CGF.GetAddrOfLocalVar(TaskArg));
+  llvm::Value *SharedRaw =
+      Bld.CreateLoad(Address(TaskRaw, CGF.VoidPtrTy, PtrAlign));
+  Address SharedTable(SharedRaw, CGF.VoidPtrTy, PtrAlign);
+
+  unsigned Index = 0;
+  for (const CapturedStmt::Capture &Cap : CS.captures()) {
+    // Always advance the slot index so that we stay aligned with the
+    // shareds-table layout established at task allocation.
+    unsigned ThisIndex = Index++;
+    if (IsNoOpRelocate(Cap))
+      continue;
+    // Project the capture's referent from the freshly reconstructed outer
+    // record. EmitLValueForField auto-loads the outer reference field, so
+    // the resulting pointer is the live referent address (not the slot).
+    const FieldDecl *OuterField = LookupOuterField(Cap);
+    assert(OuterField && "preflight should have rejected this capture");
+    QualType OuterTy =
+        C.getCanonicalTagType(cast<RecordDecl>(OuterField->getDeclContext()));
+    LValue OuterBase = CGF.MakeAddrLValue(
+        Address(OuterRaw, CGF.ConvertTypeForMem(OuterTy), PtrAlign), OuterTy);
+    llvm::Value *Mapped =
+        CGF.EmitLValueForField(OuterBase, OuterField).getPointer(CGF);
+    Mapped = Bld.CreatePointerBitCastOrAddrSpaceCast(Mapped, CGM.VoidPtrTy);
+    Bld.CreateStore(Mapped, Bld.CreateConstGEP(SharedTable, ThisIndex));
+  }
+
+  CGF.FinishFunction();
+  return Fn;
+}
+
 void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
                                         SourceLocation Loc,
                                         const OMPExecutableDirective &D,
@@ -4800,22 +4963,28 @@ void CGOpenMPRuntime::emitTaskCall(
     TGTaskArgs[2] = Result.NewTask;
     TGTaskArgs[3] = TaskAllocArgs[0]; // TaskFlags
     TGTaskArgs[4] = TaskAllocArgs[1]; // KmpTaskTWithPrivatesTySize
-    TGTaskArgs[5] = Shareds.emitRawPointer(CGF);
-    TGTaskArgs[6] = TaskAllocArgs[2]; // SharedsSize
+    TGTaskArgs[5] = TaskAllocArgs[2]; // SharedsSize
     if (auto RecType = dyn_cast<RecordType>(SharedsTy)) {
       auto *RD = RecType->getAsRecordDecl();
       if (RD->fields().empty()) {
         // FIXME: The condition might not be precisely correct here.
-        TGTaskArgs[6] = CGF.Builder.getSize(0);
+        TGTaskArgs[5] = CGF.Builder.getSize(0);
       }
     }
     if (Data.Dependences.size() == 0) {
-      TGTaskArgs[7] = CGF.Builder.getInt32(0);
-      TGTaskArgs[8] = llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
+      TGTaskArgs[6] = CGF.Builder.getInt32(0);
+      TGTaskArgs[7] = llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
     } else {
-      TGTaskArgs[7] = NumOfElements;
-      TGTaskArgs[8] = DependenciesArray.emitRawPointer(CGF);
-    }
+      TGTaskArgs[6] = NumOfElements;
+      TGTaskArgs[7] = DependenciesArray.emitRawPointer(CGF);
+    }
+    const auto *CS = cast<CapturedStmt>(D.getAssociatedStmt());
+    llvm::Function *RelocFn =
+        emitTaskRelocationFunction(CGM, Loc, *CS, CGF.CapturedStmtInfo, Data);
+    TGTaskArgs[8] = RelocFn
+                        ? CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
+                              RelocFn, CGM.VoidPtrTy)
+                        : llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
     CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
                             CGM.getModule(), OMPRTL___kmpc_taskgraph_task),
                         TGTaskArgs);
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def 
b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
index fc24280eaa077..e32308df74cae 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -360,7 +360,7 @@ __OMP_RTL(__kmpc_taskgroup, false, Void, IdentPtr, Int32)
 __OMP_RTL(__kmpc_taskgraph, false, Void, IdentPtr, Int32, VoidPtrPtr, SizeTy,
           Int32, Int32, VoidPtr, VoidPtr)
 __OMP_RTL(__kmpc_taskgraph_task, false, Int32, IdentPtr, Int32, VoidPtr, Int32,
-          SizeTy, VoidPtr, SizeTy, Int32, VoidPtr)
+          SizeTy, SizeTy, Int32, VoidPtr, VoidPtr)
 __OMP_RTL(__kmpc_taskgraph_taskloop, false, Int32, IdentPtr, Int32, VoidPtr,
           Int32, SizeTy, VoidPtr, SizeTy, Int32, Int64Ptr, Int64Ptr, Int64,
           Int32, Int32, Int64, Int32, VoidPtr)
diff --git a/openmp/runtime/src/i18n/en_US.txt 
b/openmp/runtime/src/i18n/en_US.txt
index 08e837d3dea11..3cd852abd66c6 100644
--- a/openmp/runtime/src/i18n/en_US.txt
+++ b/openmp/runtime/src/i18n/en_US.txt
@@ -482,6 +482,7 @@ AffHWSubsetIgnoringAttr      "KMP_HW_SUBSET: ignoring %1$s 
attribute. This machi
 TargetMemNotAvailable        "Target memory not available, will use default 
allocator."
 AffIgnoringNonHybrid         "%1$s ignored: This machine is not a hybrid 
architecutre. Using \"%2$s\" instead."
 AffIgnoringNotAvailable      "%1$s ignored: %2$s is not available. Using 
\"%3$s\" instead."
+OmpTaskgraphBadCapture       "Cannot locate captured shared variable reference 
for taskgraph replay"
 
 # 
--------------------------------------------------------------------------------------------------
 -*- HINTS -*-
diff --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index d660c4e191d13..befca12786e70 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -2482,6 +2482,7 @@ extern kmp_uint64 __kmp_taskloop_min_tasks;
 /*!
  */
 typedef kmp_int32 (*kmp_routine_entry_t)(kmp_int32, void *);
+typedef void (*kmp_task_relocate_t)(struct kmp_task *, void *);
 
 typedef union kmp_cmplrdata {
   kmp_int32 priority; /**< priority specified by user for the task */
@@ -2692,6 +2693,7 @@ typedef struct kmp_taskgraph_region_dep {
 typedef struct kmp_taskgraph_node {
   kmp_task_t *task;
   bool taskloop_task;
+  kmp_task_relocate_t relocate;
   kmp_taskgraph_reduce_input_data_t *reduce_input;
   union {
     // Valid when KMP_TDG_RECORDING in parent taskgraph record.
@@ -2777,6 +2779,7 @@ typedef struct kmp_taskgraph_record {
   struct kmp_taskgraph_exec_descr *exec_descrs;
   kmp_size_t exec_descr_size;
   kmp_lock_t replay_lock;
+  void *taskgraph_args = nullptr;
   // We need a taskgroup structure to keep track of recorded tasks.  This is
   // set to TRUE if the user requested "nogroup" on the taskgraph directive
   // (then we can avoid blocking at the end of the taskgraph region on replay,
@@ -4507,8 +4510,8 @@ KMP_EXPORT void __kmpc_taskgraph(ident_t *loc_ref, 
kmp_int32 gtid,
                                  void *args);
 KMP_EXPORT kmp_uint32 __kmpc_taskgraph_task(
     ident_t *loc_ref, kmp_int32 gtid, kmp_task_t *new_task, kmp_int32 flags,
-    size_t sizeof_kmp_task_t, void *shareds, size_t sizeof_shareds,
-    kmp_int32 ndeps, kmp_depend_info_t *dep_list);
+    size_t sizeof_kmp_task_t, size_t sizeof_shareds,
+    kmp_int32 ndeps, kmp_depend_info_t *dep_list, kmp_task_relocate_t reloc);
 KMP_EXPORT kmp_uint32 __kmpc_taskgraph_taskloop(
     ident_t *loc_ref, kmp_int32 gtid, kmp_task_t *new_task, kmp_int32 flags,
     size_t sizeof_kmp_task_t, void *shareds, size_t sizeof_shareds,
diff --git a/openmp/runtime/src/kmp_tasking.cpp 
b/openmp/runtime/src/kmp_tasking.cpp
index 2f73a75f11e7c..d595c555a72c0 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -2352,10 +2352,11 @@ static void 
__kmp_exec_descr_link_instances(kmp_taskgraph_exec_descr_t *descrs,
 
 /// Reset, reparent and regroup the recorded task TASK and re-invoke it.
 
-static void __kmp_omp_tg_task(kmp_int32 gtid, kmp_task_t *task,
+static void __kmp_omp_tg_task(kmp_int32 gtid, kmp_taskgraph_node_t *node,
                               kmp_taskgroup_t *taskgroup,
                               kmp_taskdata_t *parent_taskdata,
                               bool serialize_immediate) {
+  kmp_task_t *task = node->task;
   kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
   taskdata->td_parent = parent_taskdata;
 
@@ -2378,6 +2379,18 @@ static void __kmp_omp_tg_task(kmp_int32 gtid, kmp_task_t 
*task,
   if (parent_taskdata->td_flags.tasktype == TASK_EXPLICIT)
     KMP_ATOMIC_INC(&parent_taskdata->td_allocated_child_tasks);
 
+  if (node->relocate) {
+    // Call the task's relocation function with the incoming args from the 
owning
+    // taskgraph.  This rewrites capture-by-reference variables to point to the
+    // correct location on the replayed taskgraph's stack (which may not be the
+    // same as the location from the initial recorded taskgraph).
+    node->relocate(task, taskdata->owning_taskgraph->taskgraph_args);
+  } else if (task->shareds != NULL) {
+    // A missing relocation callback is only fatal when there is a non-empty
+    // shareds payload that may contain by-reference captures needing remap.
+    KMP_FATAL(OmpTaskgraphBadCapture);
+  }
+
   __kmp_omp_task(gtid, task, false);
 }
 
@@ -2404,9 +2417,9 @@ static void __kmp_taskgraph_exec_descr_start(kmp_int32 
gtid, kmp_info_t *thread,
     kmp_int32 nblocks = KMP_ATOMIC_DEC(&lowest_descr->nblocks);
     if (nblocks <= 0) {
       if (descr->region->type == TASKGRAPH_REGION_NODE) {
-        kmp_task_t *task = descr->region->task.node->task;
+        kmp_taskgraph_node_t *node = descr->region->task.node;
         kmp_taskdata_t *current_taskdata = thread->th.th_current_task;
-        __kmp_omp_tg_task(gtid, task, taskgroup, current_taskdata, false);
+        __kmp_omp_tg_task(gtid, node, taskgroup, current_taskdata, false);
       } else {
         // There's no task for a 'taskwait', so start successors immediately.
         kmp_taskgraph_exec_descr_t *walk = descr;
@@ -2447,9 +2460,9 @@ static void __kmp_taskgraph_exec_descr_start(kmp_int32 
gtid, kmp_info_t *thread,
     kmp_taskgraph_exec_descr_t *item = head;
     do {
       assert(item->region->type == TASKGRAPH_REGION_NODE);
-      kmp_task_t *task = item->region->task.node->task;
+      kmp_taskgraph_node_t *node = item->region->task.node;
       kmp_taskdata_t *current_taskdata = thread->th.th_current_task;
-      __kmp_omp_tg_task(gtid, task, taskgroup, current_taskdata, true);
+      __kmp_omp_tg_task(gtid, node, taskgroup, current_taskdata, true);
       item = item->sibling;
     } while (item != head);
     break;
@@ -5023,6 +5036,7 @@ __kmp_taskgraph_node_alloc(kmp_taskgraph_record_t *rec, 
kmp_task_t *task,
 
   new_task->task = task;
   new_task->taskloop_task = false;
+  new_task->relocate = nullptr;
   new_task->reduce_input = nullptr;
   new_task->u.unresolved.ndeps = 0;
   new_task->u.unresolved.dep_list = nullptr;
@@ -5755,6 +5769,7 @@ static void __kmp_taskgraph_reset(kmp_taskgraph_record_t 
*rec, kmp_int32 gtid,
   rec->num_mutexes = 0;
   rec->exec_descrs = nullptr;
   rec->exec_descr_size = 0;
+  rec->taskgraph_args = nullptr;
   rec->next = nullptr;
 }
 
@@ -5852,10 +5867,6 @@ static kmp_task_t *__kmp_taskgraph_clone_task(kmp_info_t 
*thread,
   // FIXME: This should use a "taskdup" function like taskloops in cases where
   // private variables are not trivially copyable.  For now, do it by plain
   // bitwise copy.
-  // FIXME 2: It's intended that this copy be persistent, and can be
-  // re-executed on taskgraph replay.  Make sure that works (for shared
-  // variables) if stack addresses change (i.e. a task-generating function is
-  // called from different call stack depths).
   kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(orig);
   size_t shareds_offset = sizeof(kmp_taskdata_t) + sizeof_kmp_task_t;
   shareds_offset = __kmp_round_up_to_val(shareds_offset, sizeof(kmp_uint64));
@@ -5864,6 +5875,11 @@ static kmp_task_t *__kmp_taskgraph_clone_task(kmp_info_t 
*thread,
   KMP_MEMCPY(copy_td, taskdata, shareds_offset + sizeof_shareds);
   // Tasks cloned for a taskgraph always have this field set.
   copy_td->owning_taskgraph = taskgraph;
+  kmp_task_t *copy_task = KMP_TASKDATA_TO_TASK(copy_td);
+  if (orig->shareds) {
+    // New task's shared data has now moved.  Update the pointer.
+    copy_task->shareds = (void*) ((char*) copy_td + shareds_offset);
+  }
   KMP_ATOMIC_ST_RLX(&copy_td->td_incomplete_child_tasks, 0);
   return KMP_TASKDATA_TO_TASK(copy_td);
 }
@@ -5972,6 +5988,9 @@ void __kmpc_taskgraph(ident_t *loc_ref, kmp_int32 gtid,
     // taskgroup.
     KMP_ATOMIC_ST_REL(&taskgroup->taskgraph.recording, record);
   }
+  // Keep the current taskgraph invocation's outlined-entry args for
+  // replay-time relocation of by-reference captures.
+  record->taskgraph_args = args;
   __kmp_release_lock(&header->header_lock, gtid);
 
   kmp_taskgraph_status_t status = KMP_ATOMIC_LD_ACQ(&record->status);
@@ -6000,9 +6019,10 @@ void __kmpc_taskgraph(ident_t *loc_ref, kmp_int32 gtid,
 
 kmp_uint32 __kmpc_taskgraph_task(ident_t *loc_ref, kmp_int32 gtid,
                                  kmp_task_t *new_task, kmp_int32 flags,
-                                 size_t sizeof_kmp_task_t, void *shareds,
+                                 size_t sizeof_kmp_task_t,
                                  size_t sizeof_shareds, kmp_int32 ndeps,
-                                 kmp_depend_info_t *dep_list) {
+                                 kmp_depend_info_t *dep_list,
+                                 kmp_task_relocate_t relocate) {
   kmp_info_t *thread = __kmp_threads[gtid];
   kmp_taskgroup_t *taskgroup = thread->th.th_current_task->td_taskgroup;
   kmp_taskgraph_record_t *rec = __kmp_taskgraph_or_parent_recording(taskgroup);
@@ -6038,6 +6058,7 @@ kmp_uint...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/200404
_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to