comaniac commented on pull request #9261:
URL: https://github.com/apache/tvm/pull/9261#issuecomment-942064270


   IMHO, CUTLASS doesn't naturally benefit dynamic workloads due to the exact 
reason you mentioned. We internally use CUTLASS for training and it works well 
because we JIT kernels with known shapes in runtime.
   
   In the case of CUTLASS with BYOC in TVM for inference, my impression is we 
could leverage high performance kernel templates while 1) keeping the binary 
self-contained, 2) fusing ops, and 3) having lightweight tuning (e.g., ~10 
trials similar to CUDNN). On the other hand, dynamic workloads are still 
challenging, and hopefully our ongoing efforts of dynamic kernel tuning and 
generation could be landed soon to make it happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to