https://github.com/jtb20 created 
https://github.com/llvm/llvm-project/pull/200404

OpenMP 6.0 lets a taskgraph region be recorded once and replayed many
times.  Each replay creates a fresh instance of the 'args' pointer
block passed to __kmpc_taskgraph (and may execute at a different stack
location, or even on a different stack), so by-reference captures inside a
recorded task must be re-pointed at the live host objects of the current
invocation; otherwise the recorded tasks would dereference stale memory
from the stack frame of the initial call to __kmpc_taskgraph.

This patch introduces the small infrastructure to do that and wires
it up for the explicit 'task' construct.  A subsequent patch
extends the same scheme to 'taskloop'.

On the compiler side (CGOpenMPRuntime.cpp), a new helper
emitTaskRelocationFunction emits a per-task thunk:

  void __omp_taskgraph_relocate.NN(kmp_task_t *task,
                                   void *outer_captures);

The thunk walks the task's captures and overwrites each entry of
task->shareds with the address of the corresponding field projected from
the freshly reconstructed outer pointer block.  Two classes of capture do
not need updating and are treated as no-ops by the thunk: captures that
correspond to a firstprivate list item (the body reads from the per-task
'.kmp_privates.t' snapshot, populated when the task is allocated and
-- for non-trivial types -- reset on each replay by the clone helper
introduced later), and captures of variables with static storage duration
(their address is link-time fixed).  Reductions of a local-stack variable
are intentionally not in this set: the taskred state is keyed on the
recording-time taskgroup hierarchy and is not yet usable on replay,
so we prefer to preserve today's relocate-returns-null / runtime-aborts
behaviour for that case so the limitation surfaces as a diagnostic.

emitTaskCall now emits such a thunk for each taskgraph-recorded task
and passes it as the new trailing argument of __kmpc_taskgraph_task.
The redundant 'shareds' parameter is dropped, since relocation now
provides the supported mechanism for refreshing that pointer.

On the runtime side (kmp.h, kmp_tasking.cpp, OMPKinds.def),
introduce a new typedef kmp_task_relocate_t and store the callback
on each recorded task in kmp_taskgraph_node_t::relocate, together
with the outer-record pointer captured at __kmpc_taskgraph entry in
kmp_taskgraph_record_t::taskgraph_args.  __kmp_omp_tg_task invokes
the callback on replay, and aborts with a new fatal diagnostic
(OmpTaskgraphBadCapture, i18n/en_US.txt) when a recorded task has a
non-null shareds payload but no relocation callback.  There is also a
fix for a pre-existing bug in __kmp_taskgraph_clone_task -- the cloned
task's shareds pointer was left referring to the original's payload --
which becomes observable as soon as the relocation thunk writes through
that pointer.

New libomp tests cover lexical and non-lexical shared captures,
pointer captures, non-trivial types, recursive recordings,
stack-depth differences across replays, and the saved/expired-
graph cases.

Assisted-By: Claude Opus 4.7



_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to