This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
     new a7be540457 [KVCache] Initialize one extra page than specified (#16849)
a7be540457 is described below

commit a7be540457d38aebf65cd36c3f0df3330921a376
Author: Ruihang Lai <[email protected]>
AuthorDate: Sun Apr 7 08:41:18 2024 -0400

    [KVCache] Initialize one extra page than specified (#16849)
    
    This PR udpates PagedKVCache to initialize one more page than
    specified via constructor. The reason is that applications usually
    depends the number of free pages (returned from `GetNumAvailablePages`)
    to decide the KV cache operation policy. If there is no this extra
    page, the KV cache will tell "no available" pages even when the
    last allocated pages are not full, which may give the applications
    an illusion that the KV cache is already completely full, and cause
    further issues.
---
 src/runtime/relax_vm/paged_kv_cache.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/runtime/relax_vm/paged_kv_cache.cc 
b/src/runtime/relax_vm/paged_kv_cache.cc
index e16d79885e..0c635967f2 100644
--- a/src/runtime/relax_vm/paged_kv_cache.cc
+++ b/src/runtime/relax_vm/paged_kv_cache.cc
@@ -1790,7 +1790,7 @@ 
TVM_REGISTER_GLOBAL("vm.builtin.paged_attention_kv_cache_create")
       int64_t prefill_chunk_size = cache_config[2];
       int64_t page_size = cache_config[3];
       bool support_sliding_window = cache_config[4];
-      int64_t num_total_pages = (total_token_capacity + page_size - 1) / 
page_size;
+      int64_t num_total_pages = (total_token_capacity + page_size - 1) / 
page_size + 1;
       if (support_sliding_window) {
         // When sliding window is enabled, each sequence may use two more 
pages at most.
         num_total_pages += reserved_num_seqs * 2;
@@ -1827,7 +1827,7 @@ 
TVM_REGISTER_GLOBAL("vm.builtin.paged_attention_kv_cache_create_reduced")
       int64_t prefill_chunk_size = cache_config[2];
       int64_t page_size = cache_config[3];
       bool support_sliding_window = cache_config[4];
-      int64_t num_total_pages = (total_token_capacity + page_size - 1) / 
page_size;
+      int64_t num_total_pages = (total_token_capacity + page_size - 1) / 
page_size + 1;
       if (support_sliding_window) {
         // When sliding window is enabled, each sequence may use two more 
pages at most.
         num_total_pages += reserved_num_seqs * 2;

Reply via email to