tqchen opened a new pull request, #14478:
URL: https://github.com/apache/tvm/pull/14478

   This PR provides a simple implementation of inplace attention kv cache for 
relax runtime. The main goal here is to help us enable auto-regressive decoding 
quickly in relax.
   
   This is likely not the only way to support attention kv-cache. We keep the 
implementation private for now and will continue to evolve the relevant code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to