KnowingNothing opened a new pull request #9909: URL: https://github.com/apache/tvm/pull/9909
This change adds full (although not all) PTX MMA code generation support for three generations of Tensor Core, including Volta, Turing, and Ampere. The generation logic is mainly implemented in ptx_mma.cc and should have no major influence on existing code. A test file is also provided in tests/python/unittest/test_tir_ptx_mma.py. Here is a list of limitations and further improvement is possible: 1. Correctness tests for int4 and binary MMA instructions are missing because NumPy has no support for int4 and binary kernels. 2. Implementation for binary MMA generates `mma.sync.aligned.m16n8k256.row.col.s32.b1.b1.s32.and.popc` for uint1 and `mma.sync.aligned.m16n8k256.row.col.s32.b1.b1.s32.xor.popc` for int1. This may not be a perfect decision. 3. Only tir.call_extern is supported. Maybe tir.call_intrin can also be supported. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
