Re: [PATCH, nvptx] Expand OpenACC child function arguments to use CUDA params space

Tom de Vries Wed, 09 Oct 2019 06:34:09 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 08-10-2019 16:05, Thomas Schwinge wrote:
> Hi Chung-Lin!
>
> While we're all waiting for Tom to comment on this ;-)

Ack, thanks for the ping ...

> -- here's another item I realized:
>
> On 2019-09-10T19:41:59+0800, Chung-Lin Tang
> <chunglin_t...@mentor.com> wrote:
>> The libgomp nvptx plugin changes are also quite contained, with
>> lots of now unneeded [...] code deleted (since we no longer first
>> cuAlloc a buffer for the argument record before cuLaunchKernel)
>
> It would be nice ;-) -- but unless I'm confused, it's not that
> simple: we either have to reject (force host-fallback execution) or
> keep supporting "old-style" nvptx offloading code: new-libgomp has
> to continue to work with nvptx offloading code once generated by
> old-GCC.  Possibly even a mixture of old and new nvptx offloading
> code, if libraries are involved, huh!
>
> I have not completely thought that through, but I suppose this
> could be addressed by adding a flag to the 'struct nvptx_fn' (or
> similar) that's synthesized by nvptx 'mkoffload'?
>
> Maybe if fact the 'enum id_map_flag' machinery that I once added
> for 'Un-parallelized OpenACC kernels constructs with nvptx
> offloading: "avoid offloading"'?  (That's part of og8 commit
> 2d42fbf7e989e4bb76727b32ef11deb5845d5ab1 -- not present on og9,
> huh?!) The 'enum id_map_flag' machinery serves the purpose of
> transporting information from the offload compiler to libgomp,
> which seems what's needed here?  (But please verify.)
>
... and for raising this issue. I think this needs to be addressed.

It would be great if we can avoid it, but ... AFAIU, this means
bumping GOMP_VERSION_NVIDIA_PTX (1 -> 2).

Using a new a.out (registers with GOMP_VERSION_NVIDIA_PTX == 2) with
an old libgomp (supports GOMP_VERSION_NVIDIA_PTX <= 1) will give us an
"Offload data incompatible with PTX plugin" error.

Using an old a.out (registers with GOMP_VERSION_NVIDIA_PTX == 1) with
a new libgomp (supports GOMP_VERSION_NVIDIA_PTX <= 2) will have to be
supported in the way that things are currently handled.

Using a new a.out (registers with GOMP_VERSION_NVIDIA_PTX == 2) with a
new libgomp (supports GOMP_VERSION_NVIDIA_PTX <= 2) will have to be
supported in the way that the patch implements things.

The current approach is that all offload-functions are assumed to be
transformed by the optimization, which implies that failure to
transform should be a compilation error (is that indeed ensured by the
patch?). Which is a bit funny for an 'optimization'. We might wanna
decide to do switch this on/off at offload-function level.

That ties in with the fact that if we're going to keep the path alive
for backward compatibility, it would be nice if we can actually test
this in the trunk version by disabling the optimization. Which is also
nice to have if we run into issues with the optimization. And once we
allow this to be disabled at user level, we're going to have to track
this at offload-function level.

So I'd say for GOMP_VERSION_NVIDIA_PTX == 2 we extend target_data with
a flag such that we can query things on a per offload-function level,
while taking care to represent the common case where the flag is the
same for all offload-functions in an economical way.

That leaves the question of how to get that information to mkoffload,
perhaps the patch Thomas mentioned can be of help there.

Thanks,
- - Tom
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEErJ0nuYSmyzCtZhpo7oVdq2ziRKAFAl2d4aEACgkQ7oVdq2zi
RKDhwQf/efEZRCR+HJ+M50FGKh5a1lrVm8QE5ue7SoY2rzjdKf2JT6tIUysJSYyP
JQYENHAz9Q/1uxYa3VYoFc1c8cVPyhutzezIWPXDVoNBoj/NEwFvQyZl4fqGfkFb
mRgEAHtfE1HZwfXp86UlJbgDV5wF1XGWQQad3P6F38NtXVTORoce79OViITnFq8I
YvfvZWx1EdomacW8oThzo9VY/CM4JeuY4r0dEv8REtk3Py5Cpw4E3xk195BgUAAS
OJj3g8Etg/wTBsgvrO6qqP8ie91Ys/9IRXjf238hay40i44Y7APGuRHgffFE6AE6
RPn24JUY0mdDj9WzlergTjsjWtfppQ==
=EdLk
-----END PGP SIGNATURE-----

Re: [PATCH, nvptx] Expand OpenACC child function arguments to use CUDA params space

Reply via email to