[GitHub] [tvm] ArmageddonKnight opened a new pull request #8678: [Target] Add __launch_bounds__ directive as part of the CUDA code generation

GitBox Fri, 06 Aug 2021 12:51:59 -0700


ArmageddonKnight opened a new pull request #8678:
URL: https://github.com/apache/tvm/pull/8678

# Short Summary

Sometimes, when executing CUDA kernels, we might encounter the error
`CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES` (e.g.,
[here](https://discuss.tvm.apache.org/t/cuda-got-error-cuda-error-launch-out-of-resources/4173)).
This happens because **the nvcc compiler allocates too many registers per
thread**. In the case when we launch the CUDA kernel using too many threads,
the GPU will notice that the CUDA kernel requests more registers than what are
available on the chip and therefore refuse to launch the kernel.

This hence implies that we need a way of telling nvcc what to expect in
terms of the number of threads per block. Luckily, the `__launch_bounds__`
directive can help us achieve what we want. In this patch, we add
`__launch_bounds__` as part of the CUDA code generation procedure.
`__launch_bounds__` will be automatically printed if it is detected that the
number of threads per block is a constant integer value. Passing this
information to nvcc allows it to spill registers if needed, which might hurt
performance, but is still better than having a CUDA kernel that is not
functional.

# Q & A

**Q: Would this affect the AutoTVM and the auto-scheduler submodule?**

A: No. Although in those cases the number of threads keeps changing at each
trial, the number will be set to a constant when it comes to the code
generation phase. Furthermore, in the case when the number of threads per block
is not a constant, `__launch_bounds__` will simply not be printed.

Any feedback on this patch is appreciated. @comaniac @icemelon @yzhliu
@yidawang

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] ArmageddonKnight opened a new pull request #8678: [Target] Add __launch_bounds__ directive as part of the CUDA code generation

Reply via email to