szha commented on a change in pull request #14006: Dual stream cudnn 
Convolution backward() with MXNET_GPU_WORKER_NSTREAMS=2.
URL: https://github.com/apache/incubator-mxnet/pull/14006#discussion_r252870869
 
 

 ##########
 File path: src/operator/nn/cudnn/cudnn_convolution-inl.h
 ##########
 @@ -224,6 +233,14 @@ class CuDNNConvolutionOp {
     CHECK_EQ(in_data.size(), expected);
     CHECK_EQ(in_grad.size(), expected);
     Stream<gpu> *s = ctx.get_stream<gpu>();
+    Stream<gpu> *s_dgrad = parallelize_backward_kernels_ ? 
ctx.get_aux_stream<gpu>() : s;
+
+    // Make sure the dgrad kernel in the aux stream doesn't start before it 
would have
+    // had it been launched into the operator's primary stream.
+    if (parallelize_backward_kernels_ && req[conv::kData] != kNullOp) {
+      CUDA_CALL(cudaEventRecord(dgrad_can_start_, s->stream_));
+      CUDA_CALL(cudaStreamWaitEvent(s_dgrad->stream_, dgrad_can_start_, 0));
+    }
 
 Review comment:
   I'm a bit concerned on leaving such cross-stream coordination to the 
operators. I think better abstraction for this is needed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to