merrymercy opened a new pull request #7040:
URL: https://github.com/apache/tvm/pull/7040


   Previously, we use `kMaxRegistersPerBlock` got from Cuda device query as the 
value of `max_local_memory_per_block` in `VerifyGPUCode`. This is wrong. They 
are just not the same thing.
   Luckily, for NVIDIA GPUs, this bug does not affect the performance. Because 
`kMaxRegistersPerBlock` returns a very large value. The check on 
`VerifyGPUCode` with this large value merely affects anything.
   
   We have to rename it to the correct name, so it is more meaningful for other 
backends.
   A better way is to set it as `INT32_MAX` to simply skip this check. Because 
there is no hard limitation in the CUDA runtime for this value. This can 
enlarge the search space while keeping most of the measured schedule still 
valid.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to