https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122432

--- Comment #6 from Tobias Burnus <burnus at gcc dot gnu.org> ---
 (In reply to H.J. Lu from comment #5)
> (In reply to Tobias Burnus from comment #4)
> > (In reply to H.J. Lu from comment #3)
> I ran your script and it doesn't report any issues.

If you don't have an offload compiler, no offload code will be
produced and the code is happily run on the host. Thus, seeing no failure
does not imply that it works.

See below for some hints how to see whether it works or not. But let's
start with a background:

* * *

What happens:

On the host side, all .gnu.offload_funcs sections will get
merged producing an array of offload functions:

__OFFLOAD_TABLE__[] __attribute__ ((__visibility__ ("hidden"))) =
{
  &__offload_func_table, &__offload_func_table
}

where __offload_func_table and __offload_func_table are in
crtoffloadbegin.a and crtoffloadend.a, respectively.

Which will here contain:
   0x401430 <d_inner_oacc_amax.0._omp_fn>
   0x401700 <d_inner_oacc_mlt_v_2.0._omp_fn>
   NULL

from the files psb_d_oacc_vect_mod-3.f90 and psb_d_oacc_mlt_v_2-2.f90,
respectively.

This part should be visible in the debugger also without actual offloading
compiler.

* * *

For the device side: When CLAIM_FILE_HOOK_V2 is called with offload
symbols present and 'known_used' is true, the device side LTO processes
that file - via a wrapper called mkoffload.

This wrapper collects from the device data a list of functions and
creates a constructor. For Nvidia, it looks like (vectoacc.xnvptx-none.c):


} func_mappings[] = {
        {"d_inner_oacc_amax$0$_omp_fn$0", 0, 0x1, 0x20}
};

...
} nvptx_data = { ...
  func_mappings,  sizeof (func_mappings) / sizeof (func_mappings[0]), ...
};
...
  GOMP_offload_register_ver (0x30001, __OFFLOAD_TABLE__, 5/*NVIDIA_PTX*/,
&nvptx_data);

* * *

This is passed to the runtime, which runs (libgomp/target.c's
GOMP_offload_register_ver):

  for (i = 0; i < num_devices; i++)
      if (devicep->type == target_type /* ... */)
        gomp_load_image_to_device (devicep, version,
                                   host_table, target_data, true);

and the latter has:

  num_target_entries
    = devicep->load_image_func (devicep->target_id, version,

  if (num_target_entries != num_funcs + num_vars + 1)
    { /* ... */
      gomp_fatal ("Cannot map target functions or variables"
                  " (expected %u + %u + 1, have %u)", num_funcs, num_vars,
                  num_target_entries);

* * *

Thus, to see the fail as in the script, you need to (a) compile for an offload
device and (b) have a working offload device (vendor lib + device) available.

If not:
(A) if you have an offload compiler such as for nvptx-none (or amdgcn-amdhsa),
    you can look at the array I have shown above (-save-temps) - to see that
    the second function is missing.
(B) Or you just look at the lto-plugin/lto-plugin.c handling, i.e. namely
    the CLAIM_FILE_HOOK_V2 and what it loads.

For the host side, the easiest to see what remains is the
__OFFLOAD_TABLE__ array – which, if used in the constructor call,
can be stepped through in the debugger; this of course fails as soon as
the __OFFLOAD_TABLE__ symbol is dropped by the linker as being unused.

* * *

Hence: Without offloading compiler, it is more difficult to see explicitly
and it requires some printf debugging or debugger stepping through.

With an offloading compiler configured, the -save-temps output already helps
with the device side – and running it in gdb + breakpointing in
GOMP_offload_register_ver or 'main' and printing __OFFLOAD_TABLE__ shows
the host side - which should neither require that the device itself is
available nor the runtime lib.

And finally, if a runtime lib + device is actually available, you will get
the full user experience of the runtime error.

Reply via email to