Jakub,
the following patch series implements the reduction handling for OpenACC:
01-trunk-reductions-core-1102.patch Core execution changes
02-trunk-reductions-ptx-1102.patch PTX backend bits
03-trunk-reductions-tests-1102.patch Testcases
The reduction mechanism relies on a new internal builtin -- IFN_GOACC_REDUCTION,
which is used in 4 different places. IYR the loop partionining is managed with
FORK and JOIN unique_fn markers. The reductions go around these as follows:
IFN_UNIQUE (HEAD_MARKER ...)
IFN_REDUCTION (SETUP ...)
IFN_UNIQUE (FORK ...)
IFN_REDUCTION (INIT ...)
IFN_UNIQUE (HEAD_MARKER)
<loop here>
IFN_UNIQUE (TAIL_MARKER ...)
IFN_REDUCTION (FINI ...)
IFN_UNIQUE (JOIN ...)
IFN_REDUCTION (TEARDOWN ...)
IFN_UNIQUE (TAIL_MARKER)
There's a quad of functions for each reduction variable of the loop. If a loop
is partitioned over multiple dimensions, there are additional quads for each
dimension, surrounding the fork/join for that dimension.
All the reduction calls look similar and are:
V = REDUCTION (KIND, REF_TO_RES, LOCAL_VAR, LEVEL, OP, OFFSET)
REF_TO_RES is a pointer to a reciever object. it is a null pointer constant if
there is no such object.
LOCAL_VAR is the executing thread's instance of the reduction variable.
LEVEL is the dimension across which this reduction is partitiong (gang, worker,
vector). As with the head/tail markers,this assignment of level is deferred to
the target compiler.
OP is the reduction operator
OFFSET is an offset into a hypothetical buffer allocated for all the reductions
of this particular loop. It's a way of identifying which quad of reductions
apply to the same logical variable, and happens to be useful in some use cases
(I'll expand on that in the PTX fragment).
All these functions return a new value for the local variable.
When everything collapses to a single thread (i.e. on the host), the
implementation of these functions is trivial.
SETUP
- if REF_TO_RES is not nullptrconst, return *REF_TO_RES, else return
LOCAL_VAR (this is a compile-time check)
INIT & FINI
- return LOCAL_VAR
TEARDOWN
- if REF_TO_RES is not nullptrconst *REF_TO_RES = LOCAL_VAR.
always return LOCAL_VAR