@Hzfengsy Sure, we will show the code as well as a sample schedule very soon. It's being under internal review now. As you will see, the schedule for TensorCore CodeGen looks no different than a normal matmul schedule for GPU. Everything is done in IR passes including matrix_a/matrix_b/accumulator recognition, row/col_major recgnition as @yangjunpro mentioned, thread index unification within a warp for tensorcore operations, loop scaling etc..
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/4052#issuecomment-537872194