[GitHub] DickJC123 commented on a change in pull request #14006: Dual stream cudnn Convolution backward() with MXNET_GPU_WORKER_NSTREAMS=2.

GitBox Mon, 18 Feb 2019 15:37:18 -0800

DickJC123 commented on a change in pull request #14006: Dual stream cudnn 
Convolution backward() with MXNET_GPU_WORKER_NSTREAMS=2.
URL: https://github.com/apache/incubator-mxnet/pull/14006#discussion_r257851551


 ##########
 File path: docs/faq/env_var.md
 ##########
 @@ -174,6 +174,12 @@ When USE_PROFILER is enabled in Makefile or CMake, the 
following environments ca
 
 ## Other Environment Variables
 
+* MXNET_GPU_WORKER_NSTREAMS
 
 Review comment:
   The short answer is that 'yes', an operator with 3 inputs might make use of 
3 streams in Backward(), so I did not want to propose an environment variable 
name like MXNET_GPU_WORKER_USE_DUAL_STREAM=0/1 that might soon become obsolete. 
 On the other hand, Convolution only needs 2 streams, and I did not want to 
burden this enhancement with more complexity than is needed at this time.  I 
propose that when we have a use-case for 3 or more streams, then we can expand 
the implementation and employ the use-case in our testing of it.
   
   At the end of every kernel execution, there is a fall-off in GPU utilization 
leading up to the completion of the last grid block.  When two streams are 
being used, these utilization gaps can be filled by work from the second 
stream.  I would guess that having 3 streams would not enhance this effect.  On 
the other hand, let's say you had 3 small independent kernels that each would 
occupy a third of the GPU.  You could see how having 3 streams would be a win 
in this case over 2 streams.
   
   So it's good that you ask, how might we expand this to 3 or more streams?  
The MXNET_GPU_WORKER_NSTREAMS environment variable would remain unchanged, 
though the documentation would indicate that the framework supports a value 
greater than 2.  Legacy env-var uses would be preserved so I think this could 
happen as part of a minor release.  At the RunContext level, a GPUAuxStream* 
would be replaced by a std::vector<GPUAuxStream*>.  The RunContext method 
get_gpu_aux_stream() might then be changed to RunContex::get_gpu_aux_stream(int 
aux_stream_id = 0), which would not break operator code that started using the 
simpler aux_stream API proposed by this PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] DickJC123 commented on a change in pull request #14006: Dual stream cudnn Convolution backward() with MXNET_GPU_WORKER_NSTREAMS=2.

Reply via email to