On 07/03/2024 11:29, Thomas Schwinge wrote:
Hi!

On 2019-11-12T13:29:16+0000, Andrew Stubbs <a...@codesourcery.com> wrote:
This patch contributes the GCN libgomp plugin, with the various
configure and make bits to go with it.

An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is
different from the libgomp-level host-fallback execution):

--- /dev/null
+++ b/libgomp/plugin/plugin-gcn.c

+/* Flag to decide if the runtime should suppress a possible fallback to host
+   execution.  */
+
+static bool suppress_host_fallback;

+static void
+init_environment_variables (void)
+{
+  [...]
+  if (secure_getenv ("GCN_SUPPRESS_HOST_FALLBACK"))
+    suppress_host_fallback = true;
+  else
+    suppress_host_fallback = false;

+/* Return true if the HSA runtime can run function FN_PTR.  */
+
+bool
+GOMP_OFFLOAD_can_run (void *fn_ptr)
+{
+  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
+
+  init_kernel (kernel);
+  if (kernel->initialization_failed)
+    goto failure;
+
+  return true;
+
+failure:
+  if (suppress_host_fallback)
+    GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
+  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
+  return false;
+}

This originates in the libgomp HSA plugin, where the idea was -- in my
understanding -- that you wouldn't have device code available for all
'fn_ptr's, and in that case transparently (shared-memory system!) do
host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
you'd get those diagnosed.

This has then been copied into the libgomp GCN plugin (see above).
However, is it really still applicable there; don't we assume that we're
generating device code for all relevant functions?  (I suppose everyone
really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?)  Should we thus
actually remove 'suppress_host_fallback' (that is, make it
always-'true'), including removal of the 'can_run' hook?  (I suppose that
even in a future shared-memory "GCN" configuration, we're not expecting
to use this again; expecting always-'true' for 'can_run'?)


Now my actual issue: the libgomp GCN plugin then invented an additional
use of 'GCN_SUPPRESS_HOST_FALLBACK':

+/* Initialize hsa_context if it has not already been done.
+   Return TRUE on success.  */
+
+static bool
+init_hsa_context (void)
+{
+  hsa_status_t status;
+  int agent_index = 0;
+
+  if (hsa_context.initialized)
+    return true;
+  init_environment_variables ();
+  if (!init_hsa_runtime_functions ())
+    {
+      GCN_WARNING ("Run-time could not be dynamically opened\n");
+      if (suppress_host_fallback)
+       GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
+      return false;
+    }

That is, if 'GCN_SUPPRESS_HOST_FALLBACK' is (routinely) set (for its
original purpose), and you have the libgomp GCN plugin configured, but
don't have 'libhsa-runtime64.so.1' available, you run into a fatal error.

The libgomp nvptx plugin in such cases silently disables the
plugin/device (and thus lets libgomp proper do its thing), and I propose
we do the same here.  OK to push the attached
"GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 
'init_hsa_runtime_functions' is not fatal"?

If you try to run the offload testsuite on a device that is not properly configured then we want FAIL, not pass-via-fallback. You're breaking that.

Andrew

Reply via email to