Arsen Arsenović <[email protected]> writes: > In my examination of BabelStream results on AMD GCN, I've found that, > for each BabelStream kernel execution, we spend significant time in > allocating and initializing memory in gomp_map_vars (~55µs, whereas the > actual BabelStream code executes in ~746µs, meaning we increase the time > BabelStream measures by 7% just on that). > > Upon further examination, I've found that the only reason gomp_map_vars > decides to allocate and map any memory in the first place is because it > is constructing the table of pointers to variables on the target, which > I've taken to calling the "target variable table". Given that the GCN > plugin already must perform some memory allocation before starting up a > kernel, namely to allocate kernel arguments, it would be beneficial if > we could merge this allocation with the kernel arguments allocation. > > In addition, since the kernel arguments live in host memory, populating > them can be performed using string functions, without any need to call > for expensive host2dev copies. > > This patch introduces an opaque type for "offload sessions". This type > is defined by each plugin and allows it to store data related to a > single offload job. The sessions are allocated and managed by libgomp, > and initialized and utilized by the plugin. Their lifetime starts with > a call to GOMP_OFFLOAD_session_start, and ends with > GOMP_OFFLOAD_{openacc_{async_,}exec,{async_,}run}. > > The patch then uses this framework to make management of the target > variable table more flexible: the plugin may elect to implement > GOMP_OFFLOAD_session_allocate_target_var_table, which allows the plugin > to attempt to allocate the target variable table in host memory. > > If it fails, or if the plugin does not provide this function, libgomp > will perform this allocation as it does today - in target memory - and > tell the session about it using > GOMP_OFFLOAD_session_set_target_var_table. > > In the case of AMD GCN, upon a call to > GOMP_OFFLOAD_session_allocate_target_var_table, the plugin will > immediately allocate kernel arguments with enough space for the target > variable table, no matter what size the plugin asks for[1], and return > that pointer to libgomp. > > This results in the runtime of gomp_map_vars effectively disappearing > from traces. > > [1] It may be beneficial to limit this, to some fixed amount, to make it > so that the future allocation cache has a higher cache hit rate. It > may also depend on whether hsa_memory_allocate for kernel arguments > takes runtime proportional to the number of bytes it needs to > allocate. > > include/ChangeLog: > > * gomp-constants.h (GOMP_VERSION): Bump. Signature of > GOMP_OFFLOAD_run et al changed. > > libgomp/ChangeLog: > > * libgomp-plugin.h (GOMP_OFFLOAD_run, GOMP_OFFLOAD_exec) > (GOMP_OFFLOAD_async_run, GOMP_OFFLOAD_openacc_async_exec): Pass > session in place of target variable table and devices. > (struct gomp_offload_session): New. > (GOMP_OFFLOAD_session_size): New > (GOMP_OFFLOAD_check_session_struct): New. > (GOMP_OFFLOAD_session_boilerplate): New. > (GOMP_OFFLOAD_session_start): New. > (GOMP_OFFLOAD_session_allocate_target_var_table): New. > (GOMP_OFFLOAD_session_set_target_var_table): New. > * libgomp.h (struct gomp_target_task): Add offload_session > field. > (struct gomp_device_descr): Add offload session management > functions. > (gomp_offload_session_new): New. > (goacc_map_vars): Add SESSION to signature > * oacc-host.c (struct gomp_offload_session): Define, for host > offload fallback case. > (host_session_size): New. Implements GOMP_OFFLOAD_session_size. > (host_session_start): New. Implements > GOMP_OFFLOAD_session_start. > (host_session_set_target_var_table): New. Implements > GOMP_OFFLOAD_session_set_target_var_table. > (host_run): Adjust to match GOMP_OFFLOAD_run. > (host_openacc_exec): Adjust to match GOMP_OFFLOAD_openacc_exec. > (host_openacc_async_exec): Adjust to match > GOMP_OFFLOAD_openacc_async_exec. > * oacc-mem.c (acc_map_data): Adjust call to goacc_map_vars. > (goacc_enter_datum): Ditto. > (goacc_enter_data_internal): Ditto. > * oacc-parallel.c (GOACC_parallel_keyed): Allocate and pass > offload session. > (GOACC_data_start): Adjust call to goacc_map_vars. > * plugin/plugin-gcn.c (struct kernel_dispatch): Remove > kernarg_cache_node. > (struct kernargs): Add a flexible array member for the target > variable table. > (struct kernel_launch): Store an offload session rather than > target var. table pointer. > (print_kernel_dispatch): Receive kernargs as parameter. > (struct gomp_offload_session): Define. > (init_session): New. > (GOMP_OFFLOAD_session_start): Implement, using init_session. > (release_session): New. > (alloc_kernargs_on_agent): Rename to... > (allocate_session_kernargs): ... this, store result in > passed-in SESSION, and allocate extra room for target variable > table (rounding it up to nearest multiple of 64 pointers). > (GOMP_OFFLOAD_session_allocate_target_var_table): Implement > using the previous function. > (GOMP_OFFLOAD_session_set_target_var_table): Ditto. > (create_kernel_dispatch): Remove kernarg allocation, instead > receiving it as an argument. > (release_kernel_dispatch): Receive kernargs as an argument, > don't release them. > (run_kernel): Adjust to use sessions. > (destroy_module): Ditto. > (GOMP_OFFLOAD_load_image): Ditto. > (execute_queue_entry): Adjust to match changed struct > kernel_launch. > (queue_push_launch): Ditto. > (gcn_exec): Receive and pass along session. > (GOMP_OFFLOAD_run): Ditto. > (GOMP_OFFLOAD_async_run): Ditto. > (GOMP_OFFLOAD_openacc_exec): Ditto. > (GOMP_OFFLOAD_openacc_async_exec): Ditto. > * plugin/plugin-nvptx.c (struct gomp_offload_session): Define. > (GOMP_OFFLOAD_session_start): Implement. > (GOMP_OFFLOAD_session_set_target_var_table): Implement. > (GOMP_OFFLOAD_openacc_exec): Adjust to receive session. > (GOMP_OFFLOAD_openacc_async_exec): Ditto. > (GOMP_OFFLOAD_run): Ditto. > * target.c (gomp_get_tvt_size): Extract helper from... > (gomp_map_vars_internal): ... here. Receive SESSION, iff doing > target offload. Use a target variable table on the host > allocated by GOMP_OFFLOAD_session_allocate_target_var_table if > possible, or call GOMP_OFFLOAD_session_set_target_var_table with > an allocated device pointer otherwise. > (gomp_map_vars): Update to pass along session. > (goacc_map_vars): Ditto. > (GOMP_target): Allocate and pass along session. > (GOMP_target_ext): Ditto. > (gomp_target_data_fallback): Adjust call to gomp_map_vars. > (GOMP_target_data): Ditto. > (GOMP_target_data_ext): Ditto. > (GOMP_target_enter_exit_data): Ditto. > (gomp_target_task_fn): Start and pass along session, the storage > for which is allocated by gomp_create_target_task. > (DLSYM2): Rename from DLSYM, adding a new parameter for the > variable to populate, akin to DLSYM_OPT. > (DLSYM): Delegate to DLSYM2. > (gomp_load_plugin_for_device): Populate session-related fields. > * task.c (gomp_create_target_task): Allocate enough storage for > an offload session. > * testsuite/libgomp.c-c++-common/gcn-kernel-launch-no-tvt-alloc.c: New > test. > * testsuite/libgomp.c-c++-common/gcn-kernel-launch-tvt-alloc.c: New > test. > --- > include/gomp-constants.h | 2 +- > libgomp/libgomp-plugin.h | 81 +++++- > libgomp/libgomp.h | 27 +- > libgomp/oacc-host.c | 63 ++++- > libgomp/oacc-mem.c | 8 +- > libgomp/oacc-parallel.c | 24 +- > libgomp/plugin/plugin-gcn.c | 254 ++++++++++++------ > libgomp/plugin/plugin-nvptx.c | 45 +++- > libgomp/target.c | 191 ++++++++----- > libgomp/task.c | 33 ++- > .../gcn-kernel-launch-no-tvt-alloc.c | 51 ++++ > .../gcn-kernel-launch-tvt-alloc.c | 16 ++ > 12 files changed, 604 insertions(+), 191 deletions(-) > create mode 100644 > libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-no-tvt-alloc.c > create mode 100644 > libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-tvt-alloc.c
Ping. -- Arsen Arsenović
signature.asc
Description: PGP signature
