MasterJH5574 opened a new pull request, #18033:
URL: https://github.com/apache/tvm/pull/18033

   This PR introduces CUTLASS gemm kernels, groupwise-scaled gemm kernels and 
group gemm kernels for Blackwell GPUs.
   
   Files are reorganized a bit so that the exposed global functions are now 
architecture agnostic.  Prior to this PR, our global function names for CUTLASS 
kernels usually end with `"_sm90"`, which brings extra complexity when the 
frontend compiler decides to dispatch kernels when there are multiple supported 
architectures, such as Hopper and Blackwell.
   
   Therefore, this PR renames those global function so that the function names 
are arch agnostic. During the build time, only the kernels that the specific 
architecture supports will be built.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to