ptrendx commented on issue #16294: Add CMake flag `CMAKE_BUILD_TYPE=Release`
URL: https://github.com/apache/incubator-mxnet/pull/16294#issuecomment-536727538
 
 
   The `too many resources requested for launch` error happens most often 
because the number of registers required  for the kernel exceeded the number of 
registers available. The register file on the GPU has some capacity that is 
shared by all threads in a SM (streaming multiprocessor), so the more registers 
is used, less number of threads can be actually launched. The problem comes 
from the fact that the number of threads launched is a value known only at 
runtime, not at compile time, so the compiler cannot do the analysis to limit 
the number of used registers / spill some to global memory. Debug build uses 
more registers than the release build, so that is where you hit the error in 
that particular kernel. This problem can be solved by telling the compiler what 
is the maximum number of threads and blocks that will be launched per SM via 
[launch 
bounds](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds).
 Inserting proper launch bounds should be a very easy change, if you have any 
problem applying it just tell us which exact kernel is giving this error and we 
can make a PR for it as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to