Kh4L opened a new pull request #20331: URL: https://github.com/apache/incubator-mxnet/pull/20331
## Description ## Changes how the GPU operations are synced in the `ThreadedEngine`. GPU operators needed to be completed so the output variables (either `NDArray` or Resource) could be tagged as readable. The engine workers needed to `cudaStreamSynchronize` on the GPU stream to know when the variable is ready to read. This prevented an optimal GPU kernels overlapping with CPU operations. This PR introduces a sync mechanism which leverages `cudaEvent` to avoid host synchronization between GPU operations. The GPU writing operators tag the variables with a `cudaEvent` based sync object, and the operators reading the variables sync or wait on these objects. ### Details ### // TODO: add details Co-author: @ptrendx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
