davidpissarra opened a new pull request, #15963: URL: https://github.com/apache/tvm/pull/15963
Part of the effort on Sliding Window Attention (SWA) https://github.com/mlc-ai/mlc-llm/issues/1003. Overriding the cache is useful when computing SWA, so we can have a more efficient cache only containing the current window keys and values. Once the cache is full we start overriding the older entries. cc @tqchen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
