psrivas2 opened a new pull request, #14465:
URL: https://github.com/apache/tvm/pull/14465

   This PR improves cutlass compilation time, by compiling a single 
CSourceModule instead of creating and compiling one for each kernel.
   
   Creating and compiling a new CSourceModule for every function is quite slow 
and slows down model with multiple functions offloaded to cutlass quite 
significantly. Instead we can generate a single CSourceModule and compile it 
once to produce a single `runtime::Module`.
   This brings down the cutlass compilation time of large models like SD Unet 
significantly (~30 min to ~4 min). Similar results on other large models.
   
   #### Testing
   `tests/python/relax/test_codegen_cutlass.py::test_matmul_offload` is broken 
at HEAD. This PR passes on all other tests when tested locally.
   
   cc @masahi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to