https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120753

--- Comment #9 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Ok, now i see that the mapping macros have extended wording to allow members of
structs, which is_device_ptr does not have:

https://www.openmp.org/spec-html/5.0/openmpsu109.html

https://www.openmp.org/spec-html/5.2/openmpsu41.html

Still this is problematic, as the mapping macros require everything to be
allocated on the host side...


For temporary data, the only route in openmp is target_alloc and then
is_device_ptr..


So one has it that by not allowing members of aggregate types, is_device_ptr
excludes to create classes or structs purely on the device and then looping on
the elements of the members.

But that is often needed for temporary calculations...

So the only way is to "unpack" the elements before the loop, which makes not
much sense even in C, which has structs for a reason...

Unfortunately, Unified address does not really work.

At least not on usual nvidia gpus, because it is extremely slow. 

It appears that during each iteration, a page fault happens and a mapping is
issued for each array element, which makes not much sense, speedwise. The loops
are then just hundred times slower

Reply via email to