I still need more time to review this, but ... On 06/24/2017 12:54 AM, Chung-Lin Tang wrote: > Hi Cesar, Thomas, > This patch is the re-implementation of OpenACC async we talked about. > The changes are rather large, so I am putting it here for a few days before > actually committing them to gomp-4_0-branch. Would appreciate if you guys > take a look. > > To overall describe the highlights of the changes: > > (1) Instead of essentially implementing the entire OpenACC async support > inside the plugin, we now use an opaque 'goacc_asyncqueue' implemented > by the plugin, along with core 'test', 'synchronize', 'serialize', etc. > plugin functions. Most of the OpenACC specific logic is pulled into > libgomp/oacc-async.c
I'm not sure if plugins need to maintain backwards compatibility. However, I don't see any changes inside libgomp.map, so maybe it's not required. > (2) CUDA events are no longer used. The limitation of no CUDA calls inside > CUDA callbacks were a problem for resource freeing, but we now stash > them onto the ptx_device and free them later. Yay! > (3) For 'wait + async', we now add a local thread synchronize, instead > of just ordering the streams. > > (4) To work with the (3) change, some front end changes were added to > propagate argument-less wait clauses as 'wait(GOACC_ASYNC_NOVAL)' to > represent a 'wait all'. What's the significance of GOMP_ASYNC_NOVAL? Wouldn't it have been easier to make that change in the gimplifier? > Patch was tested to have no regressions on gomp-4_0-branch. I'll commit > this after the weekend (or Tues.) > * plugin/plugin-nvptx.c (struct cuda_map): Remove. > (GOMP_OFFLOAD_openacc_exec): Adjust parameters and code. > (GOMP_OFFLOAD_openacc_async_exec): New plugin hook function. These two functions seem extremely similar. I wonder if you should consolidate them. Overall, I like how you were able eliminate the externally managed map_* data structure which was used to pass in arguments to nvptx_exec. Although I wonder if we should just pass in those individual arguments directly to cuLaunchKernel. But that's a big change in itself. Cesar