KnowingNothing opened a new pull request #9909:
URL: https://github.com/apache/tvm/pull/9909


   This change adds full (although not all) PTX MMA code generation support for 
three generations of Tensor Core, including Volta, Turing, and Ampere. The 
generation logic is mainly implemented in ptx_mma.cc and should have no major 
influence on existing code. A test file is also provided in 
tests/python/unittest/test_tir_ptx_mma.py. Here is a list of limitations and 
further improvement is possible:
   1. Correctness tests for int4 and binary MMA instructions are missing 
because NumPy has no support for int4 and binary kernels.
   2. Implementation for binary MMA generates 
`mma.sync.aligned.m16n8k256.row.col.s32.b1.b1.s32.and.popc` for uint1 and 
`mma.sync.aligned.m16n8k256.row.col.s32.b1.b1.s32.xor.popc` for int1. This may 
not be a perfect decision.
   3. Only tir.call_extern is supported. Maybe tir.call_intrin can also be 
supported.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to