MasterJH5574 opened a new pull request, #16729: URL: https://github.com/apache/tvm/pull/16729
This PR supports sliding window attention and attention sink for PagedKVCache, so that PagedKVCache can back models such as Mistral. Meanwhile, this PR removes the "Attention" function (without fused-qkv) from AttentionKVCache interface, given its usage is now completely covered by the "AttentionWithFusedQKV" function. Considering the cost of maintenance, we decide to remove it for now. When in the future there is the need of this function, we will add it back. This PR also unifies the global function names of the PagedKVCache with the KVState introduced earlier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
