[PR] [MetaScheduler][ROCm] Add MultiLevelTilingMatrixCore rule for auto-tensorization on ROCm [tvm]

via GitHub Sun, 22 Oct 2023 22:00:11 -0700


malixian opened a new pull request, #15967:
URL: https://github.com/apache/tvm/pull/15967


   This Pull Request adds support for AMD Matrix Core in MetaScheduler.
   
   ## Changes Made
   
   - **Add code generation for HIP**.  Additional support for hip code-gen is 
necessary. The advantages of generating HIP code are: 1. High readability of 
the code, 2. Avoiding some problems caused by generating llvm ir.  This part of 
the code is mainly contributed by @LeiWang1999 
   - **Add multi-level-tiling schedule rule for matrix core**.  Similar to the 
implementation of cuda tensorcore
   - **Add WMMA tensor intinsic**.  
   
   ## Test Result
   Compared with template-based scheduling, metascheduler can achieve better 
performance when `max_trials_global` set 32 .
   | data type |  mnk   | template-based | meta scheduler |
   |  ----  | ----  |  ----  | ----  |
   | f16f16f32|  128, 128, 128  |  692 GFLOPS  |  1251 GFLOPS |
   | f16f16f32|  256, 256, 256|  1166 GFLOPS  |   5310 GFLOPS |  
   | f16f16f32|  1024, 1024, 1024  |  24126 GFLOPS  | 28856 GFLOPS |
   | f16f16f32|  4096, 4096, 4096|  44041 GFLOPS  |  61763 GFLOPS |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [MetaScheduler][ROCm] Add MultiLevelTilingMatrixCore rule for auto-tensorization on ROCm [tvm]

Reply via email to