tkonolige opened a new pull request, #12205: URL: https://github.com/apache/tvm/pull/12205
Add functions to estimate peak flops and bandwidth for CUDA. Add a new registration mechanism to the roofline analysis to support adding any target. This mechanism uses generic functions with overrides. New targets only need to add `estimate_peak_bandwidth` and `estimate_peak_flops` functions. Also fix cuda codegen and tensorcore_infer_fragment.cc to support filling matrix_a and matrix_b fragments. @AndrewZhaoLuo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
