merrymercy opened a new pull request #7040: URL: https://github.com/apache/tvm/pull/7040
Previously, we use `kMaxRegistersPerBlock` got from Cuda device query as the value of `max_local_memory_per_block` in `VerifyGPUCode`. This is wrong. They are just not the same thing. Luckily, for NVIDIA GPUs, this bug does not affect the performance. Because `kMaxRegistersPerBlock` returns a very large value. The check on `VerifyGPUCode` with this large value merely affects anything. We have to rename it to the correct name, so it is more meaningful for other backends. A better way is to set it as `INT32_MAX` to simply skip this check. Because there is no hard limitation in the CUDA runtime for this value. This can enlarge the search space while keeping most of the measured schedule still valid. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
