tkonolige opened a new pull request, #12205:
URL: https://github.com/apache/tvm/pull/12205

   Add functions to estimate peak flops and bandwidth for CUDA. Add a new 
registration mechanism to the roofline analysis to support adding any target. 
This mechanism uses generic functions with overrides. New targets only need to 
add `estimate_peak_bandwidth` and `estimate_peak_flops` functions.
   
   Also fix cuda codegen and tensorcore_infer_fragment.cc to support filling 
matrix_a and matrix_b fragments.
   
   @AndrewZhaoLuo 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to