Hi! This patch series further reduces overhead of launching kernels on GCN devices on top of the already-landed patches, by removing a redundant allocation and reducing the overhead of constructing the target variable table (the table of addresses of mapped variables in device memory) by moving it into host memory and kernel arguments. This then piggy-backs off of the kernel argument cache previously added to avoid allocating a new target variable table in most cases.
It also introduces the concept of "offload sessions", that can be further expanded in the future, to carry the state required to start a target region. This reduces the overhead of launching a kernel by ~27%. The series also reduces the overhead of launching threads on the actual GCN device, by parallelizing thread initialization, akin to what the patch proposed by Matthew Malcomson does here: https://inbox.sourceware.org/gcc-patches/[email protected]/ Arsen Arsenović (4): libgomp/gcn: parallelize initializing threads of a team libgomp: let plugins handle allocating the target variable table libgomp/plugin-gcn: remove unneeded heap allocation in run_kernel libgomp/oacc-mem: add missing assert to goacc_enter_datum include/gomp-constants.h | 2 +- libgomp/config/gcn/team.c | 121 ++++--- libgomp/libgomp-plugin.h | 81 ++++- libgomp/libgomp.h | 58 +++- libgomp/oacc-host.c | 63 +++- libgomp/oacc-mem.c | 11 +- libgomp/oacc-parallel.c | 24 +- libgomp/plugin/plugin-gcn.c | 310 ++++++++++++------ libgomp/plugin/plugin-nvptx.c | 45 ++- libgomp/target.c | 191 +++++++---- libgomp/task.c | 33 +- .../gcn-kernel-launch-no-tvt-alloc.c | 51 +++ .../gcn-kernel-launch-tvt-alloc.c | 16 + 13 files changed, 750 insertions(+), 256 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-no-tvt-alloc.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-tvt-alloc.c -- 2.54.0
