MasterJH5574 opened a new pull request, #16824: URL: https://github.com/apache/tvm/pull/16824
This PR introduces class `PagedKVCacheAuxDataManager` for PagedKVCache. This class manages all the integer auxiliary data required for paged attention and other KV cache operations, such as page table arrays, position arrays, etc.. The purpose of introducing this class is because prior to this PR, for each auxiliary array we issue a host-to-device copy. This may cause extra overhead, since these auxiliary array are usually lightweight. One simple idea is to "merge" all the auxiliary arrays into a single one, and taking slices of this large array for each original auxiliary array. By doing this, we enable to issue only one single host-to-device copy for the auxiliary arrays altogether. The intrduction of `PagedKVCacheAuxDataManager` abstracts the interface that PagedKVCache copies host arrays to device arrays, enabling us to support both the previous way of copying and the new way. To support slicing for attention-related TIR functions, we introduce `elem_offset` match in TIR functions in this PR. This PR also bumps FlashInfer to support the auxiliary array slicing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
