PR92881 "[OpenACC] In async context, need to use 'gomp_remove_var_async' instead of 'gomp_remove_var'" (was: [PATCH] OpenACC reference count overhaul)

Thomas Schwinge Tue, 10 Dec 2019 00:26:45 -0800

Hi Julian!

On 2019-12-09T15:04:15+0000, Julian Brown <jul...@codesourcery.com> wrote:
> On Mon, 9 Dec 2019 15:44:25 +0100
> Thomas Schwinge <tho...@codesourcery.com> wrote:
>> On 2019-10-03T09:35:04-0700, Julian Brown <jul...@codesourcery.com>
>> wrote:
>> > --- a/libgomp/oacc-mem.c
>> > +++ b/libgomp/oacc-mem.c  
>> 
>> > @@ -715,48 +684,34 @@ delete_copyout (unsigned f, void *h, size_t
>> > s, int async, const char *libfnname)  
>> 
>> >        if (f & FLAG_COPYOUT)
>> > [...]
>> >      gomp_copy_dev2host (acc_dev, aq, h, d, s);
>> >    }
>> > -      gomp_remove_var (acc_dev, n);
>> > +      gomp_remove_var_async (acc_dev, n, aq);  
>> 
>> Conceptually, I understand correctly that we need to use this (new)
>> 'gomp_remove_var_async' to make sure that we don't
>> 'gomp_free_device_memory' while the 'gomp_copy_dev2host' cited above
>> is still in process?
>
> Yep.


OK, so please prepare a patch changing just that, referencing PR92881:
's%gomp_remove_var%gomp_remove_var_async%' as cited above and also in
'libgomp/target.c:gomp_unmap_vars_internal' (for clarity, even though it
doesn't matter in practice as that call will never
'gomp_free_device_memory'; see
<871rtg43me.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/871rtg43me.fsf@euler.schwinge.homeip.net>),
plus the addition of 'libgomp/target.c:gomp_remove_var_async' etc.


>> I'm curious why this isn't causing any problems for nvptx offloading
>> already, any thoughts on that?  Or, is this just missing test
>> coverage? (Always difficult for 'async' stuff, of course.)  By
>> chance, is this right now already causing problems with AMD GCN
>> offloading?  (I really need to set up AMD GCN offloading testing...)
>
> In a few cases, async stuff on nvidia seems to "just work" even in
> cases where we wouldn't expect it to via inspection (either because the
> driver/hardware is doing something "magic"

Yeah, I too wondered whether there might be some such "magic" going on,
to "help" users...

> or because we're
> somehow driving async operations in such a way that they run
> synchronously in practice).

Hope that's not that case.  ;-)

> One such case is with the "ephemeral"
> asynchronous host-to-device memory copy patch.

(Yeah, I still need to look into that.)

> The AMD side seems much more sensitive to improper async behaviour --
> but I don't actually remember if I hit problems with this code in
> particular.


>> I'm citing below the changes introducing 'gomp_remove_var_async',
>> modelled similar to the existing 'gomp_unmap_vars_async'.
>> 
>> 
>> Also for both these, do I understand correctly, that it's actually not
>> the 'gomp_unref_tgt' that needs to be "delayed" via
>> 'goacc_asyncqueue', but rather really only the
>> 'gomp_free_device_memory', called via 'gomp_unmap_tgt', called via
>> 'gomp_unref_tgt'?  In other words: why do we need to keep the 'struct
>> target_mem_desc' alive?  Per my understanding, that one is one
>> component of the mapping table, and not relevant anymore (thus can be
>> 'free'd) as soon as it has been determined that 'tgt->refcount ==
>> 0'?  Am I missing something there?
>
> IIRC, that was Chung-Lin's choice. I'll CC him in.

;-) Or even mine; see 'gomp_unmap_vars_async' description and incremental
patch in <https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01620.html>.

> I think delaying
> freeing of the target_mem_desc isn't really a huge problem, in practice.

It certainly isn't a problem (only small bits of host memory "delayed"),
but it still isn't the most clean design.  Anyway:

>> It will be OK to clean that up later

>> but I'd like to understand this
>> now.  Well, or, stating that you just blindly copied that from the
>> existing 'gomp_unmap_vars_async' is fine, too!  ;-P
>
> Some changes arose via the porting to AMD GCN, and some may have been
> drive-by fixes (e.g. where a synchronous call was used in a context
> where it is obvious that an asynchronous call is really needed).

Please, again, for sake of easy review, always do such changes separately
from whatever else you're working on.  This of course will add a bit of
delay during your original development, but will make review and
reasoning much, much easier -- at that time, and also when someone
(yourself even?) needs to look up again something from the development
history.


> Like
> you mentioned, test coverage could probably be better, and writing
> reliable tests for async behaviour is challenging.

Thus we need to invent something, eventually.  Not testing stuff because
it's challenging is not a good excuse for shipping un-tested code.


Grüße
 Thomas

signature.asc
Description: PGP signature

PR92881 "[OpenACC] In async context, need to use 'gomp_remove_var_async' instead of 'gomp_remove_var'" (was: [PATCH] OpenACC reference count overhaul)

Reply via email to