MasterJH5574 opened a new pull request, #17618: URL: https://github.com/apache/tvm/pull/17618
This PR introduces the MLA attention kernels written in TIR. It also implements the KV cache MLA computation logic. A new unit test file is added to ensure the correctness of the TIR kernels. This PR also fixes a few TIR prefill kernel tile size initialization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
