MasterJH5574 opened a new pull request, #18033: URL: https://github.com/apache/tvm/pull/18033
This PR introduces CUTLASS gemm kernels, groupwise-scaled gemm kernels and group gemm kernels for Blackwell GPUs. Files are reorganized a bit so that the exposed global functions are now architecture agnostic. Prior to this PR, our global function names for CUTLASS kernels usually end with `"_sm90"`, which brings extra complexity when the frontend compiler decides to dispatch kernels when there are multiple supported architectures, such as Hopper and Blackwell. Therefore, this PR renames those global function so that the function names are arch agnostic. During the build time, only the kernels that the specific architecture supports will be built. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
