MasterJH5574 opened a new pull request, #16396: URL: https://github.com/apache/tvm/pull/16396
This PR enhances PagedKVCache with the inline RoPE compute, which unblocks the movement towards sliding window and attention sink. Both FlashInfer and TIR kernels are updated in this PR with the RoPE calculation. Note that FlashInfer is bumped in order to include the RoPE update. The previous standalone kernel used for RoPE application are thereby removed. --- Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
