Hi Tom,
this patch adjusts nvptx_free() in libgomp/plugin/plugin-nvptx.c to avoid a
"GOMP_PLUGIN_acc_thread() == NULL" check that was causing problems under
OpenMP offloading.

This check was originally used to determine if nvptx_free() was running under
CUDA callback context, when freeing resources from an OpenACC asynchronous 
compute
region. Since CUDA API calls are not allowed inside callback context, we have
to save the freed block to ptx_dev->free_blocks, and cuMemFree it later.

The check to see if GOMP_PLUGIN_acc_thread() exists to determine normal host 
thread
vs. callback thread worked under -fopenacc, but since the OpenACC per-thread 
data
is not created under -fopenmp, and always returned NULL, we have a leak 
situation
where OpenMP offloading kept accumulating freed device memory blocks until 
failing;
nvptx_free() never reaches the part where cuMemFree() is actually called.

I reviewed the CUDA API docs and see that CUDA_ERROR_NOT_PERMITTED is returned
for such CUDA calls inside callback context, and it appears to be enough to 
replace
the current check, so the new code sees if this error is returned on the first
cuMemGetAddressRange call to determine callback context. This should now work
for both OpenACC/OpenMP.

(Tobias, Catherine, the earlier internal patch to re-organize this callback 
context
checking did not work in general, since OpenACC also uses the .queue_callback
functionality to free the struct target_mem_desc asynchronously, so in general 
we
have to ensure nvptx_free() could be used under both normal/callback context)

This patch has been libgomp tested for x86_64-linux with nvptx offloading 
without
regressions, and should be applied for mainline and GCC10. Is this okay?

Thanks,
Chung-Lin

2020-08-20  Chung-Lin Tang  <clt...@codesourcery.com>

        libgomp/
        * plugin/plugin-nvptx.c (nvptx_free): Change "GOMP_PLUGIN_acc_thread () == 
NULL"
        test into check of CUDA_ERROR_NOT_PERMITTED status for 
cuMemGetAddressRange.
        Adjust comments.
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index ec103a2f40b..188a34f1d04 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1038,27 +1038,34 @@ goacc_profiling_acc_ev_free (struct goacc_thread *thr, 
void *p)
 }
 
 static bool
 nvptx_free (void *p, struct ptx_device *ptx_dev)
 {
-  /* Assume callback context if this is null.  */
-  if (GOMP_PLUGIN_acc_thread () == NULL)
+  CUdeviceptr pb;
+  size_t ps;
+
+  CUresult r = CUDA_CALL_NOCHECK (cuMemGetAddressRange, &pb, &ps,
+                                 (CUdeviceptr) p);
+  if (r == CUDA_ERROR_NOT_PERMITTED)
     {
+      /* We assume that this error indicates we are in a CUDA callback context,
+        where all CUDA calls are not allowed. Arrange to free this piece of
+        device memory later.  */
       struct ptx_free_block *n
        = GOMP_PLUGIN_malloc (sizeof (struct ptx_free_block));
       n->ptr = p;
       pthread_mutex_lock (&ptx_dev->free_blocks_lock);
       n->next = ptx_dev->free_blocks;
       ptx_dev->free_blocks = n;
       pthread_mutex_unlock (&ptx_dev->free_blocks_lock);
       return true;
     }
-
-  CUdeviceptr pb;
-  size_t ps;
-
-  CUDA_CALL (cuMemGetAddressRange, &pb, &ps, (CUdeviceptr) p);
+  else if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("cuMemGetAddressRange error: %s", cuda_error (r));
+      return false;
+    }
   if ((CUdeviceptr) p != pb)
     {
       GOMP_PLUGIN_error ("invalid device address");
       return false;
     }

Reply via email to