Hi Sandra,

Sandra Loosemore wrote:
On 3/1/24 08:23, Tobias Burnus wrote:
Maybe the proposed wording will help others to avoid this pitfall.
(Or is this superfluous as -foffload= is not much used and, even if,
no one then remembers or finds this none?)

Well, I spent a long time looking at this, and my only conclusion is that I don't really understand what the problem you're trying to solve is.  If it's problematical to have the runtime know about offload devices the compiled code isn't using, don't users also need to know how to restrict the runtime to a particular set of devices the same way -foffload= lets you do, and not just how to disable offloading in the runtime entirely? It's pretty clearly documented already how -foffload affects the compiler's behavior, and the library's behavior is already documented in its own manual.  Maybe what we don't have is a tutorial on how to build/link/run programs using a specific offload device, or on the host?

The problem is for code like the following, which is perfectly valid
and works

(A) If you don't have any offload device
    (independent of the compiler options)

(B) If you have an offload device (supported by your libgomp)
    and compiled with offloading support (for that device)

But (C) if you have an offload device and compile as:
  gcc -fopenmp -foffload=disabled

it will fail at runtime with:

dev = 0 / num devs = 1 Segmentation fault (core dumped) The problem is that there is a mismatch between the code (assumes no offload code + always host fallback) and the run-time library (which detects offload devices), such that the API routines uses a different device than the 'target' code:
--------------------
#include <omp.h>
#include <stdio.h>

#define N 2064
int
main ()
{
  int *x = (int*) omp_target_alloc (sizeof(int)*N,
                                    omp_get_default_device ());
  printf ("dev = %d / num devs = %d\n",
          omp_get_default_device (), omp_get_num_devices ());
  #pragma omp target is_device_ptr(x)
  for (int i = 0; i < N; ++i)
    x[i] = i;
}
-------------------

On the technical side, it is not really surprising but it
might be still be confusing for the user. Obviously, it can
also occur if you compile, e.g., for AMD GCN and only an
Nvidia device is available - but there the solution would be
the same (disable all devices).

(OpenMP 6.0 will provide a environment variable that allows
fine tuning of the available devices.)


Questions:

* Is such a usage common enough to matter?
I guess for some benchmark use it make – to test whether
real offloading or host fallback is faster + if the latter
is true, it might also get used in operational code.

* Are API routines used in such a code in a way that it breaks?
(Unfortunately not very unlikely in larger code.)

If there is enough real-world usage (= 2x yes to the questions above):
* How to word is to help users and not to confuse them?

Tobias

Reply via email to