[GitHub] [tvm] masahi opened a new pull request #9402: [CUTLASS, Eazy] Cache profiling result and support compiling generated kernels in parallel

GitBox Sun, 31 Oct 2021 02:32:16 -0700


masahi opened a new pull request #9402:
URL: https://github.com/apache/tvm/pull/9402



   Adds huge QOL improvement features when applying CUTLASS byoc to real world 
models like `bert-large`:
   
   * Cache profiling results so that we don't have to repeat profiling for the 
same `(M, N, K)`.
   * Use a new feature in CUDA 11.2 to support all generated kernels in 
parallel. `bert-large` ends up generating hundreds of kernels after profiling, 
and all of them are compiled by a single invocation of `nvcc`. Without this 
feature, compiling is too slow and painful.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] masahi opened a new pull request #9402: [CUTLASS, Eazy] Cache profiling result and support compiling generated kernels in parallel

Reply via email to