MasterJH5574 opened a new pull request, #16729:
URL: https://github.com/apache/tvm/pull/16729

   This PR supports sliding window attention and attention sink for 
PagedKVCache, so that PagedKVCache can back models such as Mistral.
   
   Meanwhile, this PR removes the "Attention" function (without fused-qkv) from 
AttentionKVCache interface, given its usage is now completely covered by the 
"AttentionWithFusedQKV" function. Considering the cost of maintenance, we 
decide to remove it for now. When in the future there is the need of this 
function, we will add it back.
   
   This PR also unifies the global function names of the PagedKVCache with the 
KVState introduced earlier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to