Kh4L opened a new pull request #20331:
URL: https://github.com/apache/incubator-mxnet/pull/20331


   ## Description ##
   Changes how the GPU operations are synced in the `ThreadedEngine`.
   
   GPU operators needed to be completed so the output variables (either 
`NDArray` or Resource) could be tagged as readable. The engine workers needed 
to `cudaStreamSynchronize` on the GPU stream to know when the variable is ready 
to read. This prevented an optimal GPU kernels overlapping with CPU operations. 
 
   
   This PR introduces a sync mechanism which leverages `cudaEvent` to avoid 
host synchronization between GPU operations. 
   The GPU writing operators tag the variables with a `cudaEvent` based sync 
object, and the operators reading the variables sync or wait on these objects.
   
   ### Details ###
   // TODO: add details
   
   Co-author: @ptrendx 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to