DickJC123 commented on a change in pull request #14006: Dual stream cudnn
Convolution backward() with MXNET_GPU_WORKER_NSTREAMS=2.
URL: https://github.com/apache/incubator-mxnet/pull/14006#discussion_r252886799
##########
File path: src/operator/nn/cudnn/cudnn_convolution-inl.h
##########
@@ -224,6 +233,14 @@ class CuDNNConvolutionOp {
CHECK_EQ(in_data.size(), expected);
CHECK_EQ(in_grad.size(), expected);
Stream<gpu> *s = ctx.get_stream<gpu>();
+ Stream<gpu> *s_dgrad = parallelize_backward_kernels_ ?
ctx.get_aux_stream<gpu>() : s;
+
+ // Make sure the dgrad kernel in the aux stream doesn't start before it
would have
+ // had it been launched into the operator's primary stream.
+ if (parallelize_backward_kernels_ && req[conv::kData] != kNullOp) {
+ CUDA_CALL(cudaEventRecord(dgrad_can_start_, s->stream_));
+ CUDA_CALL(cudaStreamWaitEvent(s_dgrad->stream_, dgrad_can_start_, 0));
+ }
Review comment:
@ptrendx agrees and we've discussed an approach that would unburden
operators from this low-level manipulation. Hold off on a final review / merge
of this PR until I've prototyped it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services