Hi Thomas,

Thomas Schwinge wrote:
/* Return the number of GCN devices on the system. */ int
-GOMP_OFFLOAD_get_num_devices (void)
+GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
  {
    if (!init_hsa_context ())
      return 0;
+  /* Return -1 if no omp_requires_mask cannot be fulfilled but
+     devices were present.  */
+  if (hsa_context.agent_count > 0 && omp_requires_mask != 0)
+    return -1;
    return hsa_context.agent_count;
  }
...
OK to push the attached "nvptx: 'cuDeviceGetCount' failure is fatal"?

I think the real question is: what does a 'cuDeviceGetCount' fail mean?

Does it mean a serious error – or could it just be a permissions issue such that the user has no device access but otherwise is fine?

Because if it is, e.g., a permission problem – just returning '0' (no devices) would seem to be the proper solution.

But if it is expected to be always something serious, well, then a fatal error makes more sense.

The possible exit codes are:

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE

which does not really help.

My impression is that 0 is usually returned if something goes wrong (e.g. with permissions) such that an error is a real exception. But all three choices seem to make about equally sense: either host fallback (with 0 or -1) or a fatal error.

Tobias

Reply via email to