[GitHub] [tvm] tqchen opened a new pull request, #14478: [Unity][VM] Add Attention KV cache builtin

via GitHub Mon, 03 Apr 2023 14:54:21 -0700


tqchen opened a new pull request, #14478:
URL: https://github.com/apache/tvm/pull/14478


   This PR provides a simple implementation of inplace attention kv cache for 
relax runtime. The main goal here is to help us enable auto-regressive decoding 
quickly in relax.
   
   This is likely not the only way to support attention kv-cache. We keep the 
implementation private for now and will continue to evolve the relevant code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] tqchen opened a new pull request, #14478: [Unity][VM] Add Attention KV cache builtin

Reply via email to