[PR] [Unity][SWA] Overriding windowed cache support [tvm]

via GitHub Sat, 21 Oct 2023 12:33:51 -0700


davidpissarra opened a new pull request, #15963:
URL: https://github.com/apache/tvm/pull/15963


   Part of the effort on Sliding Window Attention (SWA) 
https://github.com/mlc-ai/mlc-llm/issues/1003. Overriding the cache is useful 
when computing SWA, so we can have a more efficient cache only containing the 
current window keys and values. Once the cache is full we start overriding the 
older entries.
   
   cc @tqchen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [Unity][SWA] Overriding windowed cache support [tvm]

Reply via email to