MasterJH5574 opened a new pull request, #16396:
URL: https://github.com/apache/tvm/pull/16396

   This PR enhances PagedKVCache with the inline RoPE compute, which unblocks 
the movement towards sliding window and attention sink.
   
   Both FlashInfer and TIR kernels are updated in this PR with the RoPE 
calculation. Note that FlashInfer is bumped in order to include the RoPE update.
   
   The previous standalone kernel used for RoPE application are thereby removed.
   
   ---
   
   Co-authored-by: Bohan Hou <[email protected]>
   Co-authored-by: Hongyi Jin <[email protected]>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to