On Tue, 24 Nov 2015 18:27:24 +0800 Chung-Lin Tang <clt...@codesourcery.com> wrote:
> Hi, this patch reworks some of the way that asynchronous copyouts are > implemented for OpenACC in libgomp. > > Before this patch, we had a somewhat confusing way of implementing > this by having two refcounts for each mapping: refcount and > async_refcount, which I never got working again after the last wave > of async regressions showed up. > > So this patch implements what I believe to be a simplification: > async_refcount is removed, and instead of trying to queue the async > copyouts during unmapping we actually do that during the plugin event > handling. This requires a addition of the async stream integer as an > argument to the register_async_cleanup plugin hook, but overall I > think this should be more elegant than before. This looks OK to me I think (I've only looked fairly briefly). I vaguely remember trying something along these lines in an earlier iteration of the async support -- maybe hitting problems with locking (I see you have code to mitigate problems with that, and locking generally has probably evolved a bit since I last looked at the code in detail anyway). Can event_gc ever be called when the *device* lock is held? I'm slightly concerned that pushing async unmapping into event_gc means that program-level semantics are deferred to the backend, which is arguably the wrong place. But then I don't understand what went wrong with the dual-refcount implementation, so maybe it's unavoidable for some reason. HTH, Julian