davidpissarra opened a new pull request, #15963:
URL: https://github.com/apache/tvm/pull/15963

   Part of the effort on Sliding Window Attention (SWA) 
https://github.com/mlc-ai/mlc-llm/issues/1003. Overriding the cache is useful 
when computing SWA, so we can have a more efficient cache only containing the 
current window keys and values. Once the cache is full we start overriding the 
older entries.
   
   cc @tqchen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to