-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 08-10-2019 16:05, Thomas Schwinge wrote: > Hi Chung-Lin! > > While we're all waiting for Tom to comment on this ;-)
Ack, thanks for the ping ... > -- here's another item I realized: > > On 2019-09-10T19:41:59+0800, Chung-Lin Tang > <chunglin_t...@mentor.com> wrote: >> The libgomp nvptx plugin changes are also quite contained, with >> lots of now unneeded [...] code deleted (since we no longer first >> cuAlloc a buffer for the argument record before cuLaunchKernel) > > It would be nice ;-) -- but unless I'm confused, it's not that > simple: we either have to reject (force host-fallback execution) or > keep supporting "old-style" nvptx offloading code: new-libgomp has > to continue to work with nvptx offloading code once generated by > old-GCC. Possibly even a mixture of old and new nvptx offloading > code, if libraries are involved, huh! > > I have not completely thought that through, but I suppose this > could be addressed by adding a flag to the 'struct nvptx_fn' (or > similar) that's synthesized by nvptx 'mkoffload'? > > Maybe if fact the 'enum id_map_flag' machinery that I once added > for 'Un-parallelized OpenACC kernels constructs with nvptx > offloading: "avoid offloading"'? (That's part of og8 commit > 2d42fbf7e989e4bb76727b32ef11deb5845d5ab1 -- not present on og9, > huh?!) The 'enum id_map_flag' machinery serves the purpose of > transporting information from the offload compiler to libgomp, > which seems what's needed here? (But please verify.) > ... and for raising this issue. I think this needs to be addressed. It would be great if we can avoid it, but ... AFAIU, this means bumping GOMP_VERSION_NVIDIA_PTX (1 -> 2). Using a new a.out (registers with GOMP_VERSION_NVIDIA_PTX == 2) with an old libgomp (supports GOMP_VERSION_NVIDIA_PTX <= 1) will give us an "Offload data incompatible with PTX plugin" error. Using an old a.out (registers with GOMP_VERSION_NVIDIA_PTX == 1) with a new libgomp (supports GOMP_VERSION_NVIDIA_PTX <= 2) will have to be supported in the way that things are currently handled. Using a new a.out (registers with GOMP_VERSION_NVIDIA_PTX == 2) with a new libgomp (supports GOMP_VERSION_NVIDIA_PTX <= 2) will have to be supported in the way that the patch implements things. The current approach is that all offload-functions are assumed to be transformed by the optimization, which implies that failure to transform should be a compilation error (is that indeed ensured by the patch?). Which is a bit funny for an 'optimization'. We might wanna decide to do switch this on/off at offload-function level. That ties in with the fact that if we're going to keep the path alive for backward compatibility, it would be nice if we can actually test this in the trunk version by disabling the optimization. Which is also nice to have if we run into issues with the optimization. And once we allow this to be disabled at user level, we're going to have to track this at offload-function level. So I'd say for GOMP_VERSION_NVIDIA_PTX == 2 we extend target_data with a flag such that we can query things on a per offload-function level, while taking care to represent the common case where the flag is the same for all offload-functions in an economical way. That leaves the question of how to get that information to mkoffload, perhaps the patch Thomas mentioned can be of help there. Thanks, - - Tom -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEErJ0nuYSmyzCtZhpo7oVdq2ziRKAFAl2d4aEACgkQ7oVdq2zi RKDhwQf/efEZRCR+HJ+M50FGKh5a1lrVm8QE5ue7SoY2rzjdKf2JT6tIUysJSYyP JQYENHAz9Q/1uxYa3VYoFc1c8cVPyhutzezIWPXDVoNBoj/NEwFvQyZl4fqGfkFb mRgEAHtfE1HZwfXp86UlJbgDV5wF1XGWQQad3P6F38NtXVTORoce79OViITnFq8I YvfvZWx1EdomacW8oThzo9VY/CM4JeuY4r0dEv8REtk3Py5Cpw4E3xk195BgUAAS OJj3g8Etg/wTBsgvrO6qqP8ie91Ys/9IRXjf238hay40i44Y7APGuRHgffFE6AE6 RPn24JUY0mdDj9WzlergTjsjWtfppQ== =EdLk -----END PGP SIGNATURE-----