@Hzfengsy Sure, we will show the code as well as a sample schedule very soon. 
It's being under internal review now. As you will see, the schedule for 
TensorCore CodeGen looks no different than a normal matmul schedule for GPU. 
Everything is done in IR passes including matrix_a/matrix_b/accumulator 
recognition, row/col_major recgnition as @yangjunpro mentioned, thread index 
unification within a warp for tensorcore operations, loop scaling etc..

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4052#issuecomment-537872194

Reply via email to