This patch fixes an OpenMP performance issue on NVPTX.
The problem is that it deallocates the stack memory when it shouldn't, forcing the GOMP_OFFLOAD_run function to allocate the stack space again, before every kernel launch.
The memory is only meant to be deallocated when a data allocation fails, in the hope that memory can be reallocated more efficiently, but there's an additional, unconditional deallocate that looks like it may have been vestigial debug code, or something.
Fixing the issue gives a 3x speed-up running the BabelStream benchmark. Andrew
nvptx: remove erroneous stack deletion The stacks are not supposed to be deleted every time memory is allocated, only when there is insufficient memory. The unconditional call here seems to be in error, and is causing a costly reallocation of the stacks before every launch. libgomp/ * plugin/plugin-nvptx.c (GOMP_OFFLOAD_alloc): Remove early call to nvptx_stacks_free. diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 942fb989bac..21db2bd29c8 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1411,8 +1411,6 @@ GOMP_OFFLOAD_alloc (int ord, size_t size) ptx_dev->free_blocks = NULL; pthread_mutex_unlock (&ptx_dev->free_blocks_lock); - nvptx_stacks_free (ptx_dev, false); - while (blocks) { tmp = blocks->next;