DickJC123 commented on a change in pull request #14006: Dual stream cudnn 
Convolution backward() with MXNET_GPU_WORKER_NSTREAMS=2.
URL: https://github.com/apache/incubator-mxnet/pull/14006#discussion_r252886799
 
 

 ##########
 File path: src/operator/nn/cudnn/cudnn_convolution-inl.h
 ##########
 @@ -224,6 +233,14 @@ class CuDNNConvolutionOp {
     CHECK_EQ(in_data.size(), expected);
     CHECK_EQ(in_grad.size(), expected);
     Stream<gpu> *s = ctx.get_stream<gpu>();
+    Stream<gpu> *s_dgrad = parallelize_backward_kernels_ ? 
ctx.get_aux_stream<gpu>() : s;
+
+    // Make sure the dgrad kernel in the aux stream doesn't start before it 
would have
+    // had it been launched into the operator's primary stream.
+    if (parallelize_backward_kernels_ && req[conv::kData] != kNullOp) {
+      CUDA_CALL(cudaEventRecord(dgrad_can_start_, s->stream_));
+      CUDA_CALL(cudaStreamWaitEvent(s_dgrad->stream_, dgrad_can_start_, 0));
+    }
 
 Review comment:
   @ptrendx agrees and we've discussed an approach that would unburden 
operators from this low-level manipulation.  Hold off on a final review / merge 
of this PR until I've prototyped it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to