[GitHub] [tvm] yelite commented on a diff in pull request #15064: [Unity] Add an API to create multiple kv caches with single allocation

via GitHub Fri, 09 Jun 2023 11:18:40 -0700


yelite commented on code in PR #15064:
URL: https://github.com/apache/tvm/pull/15064#discussion_r1224614780



##########
src/runtime/relax_vm/lm_support.cc:
##########
@@ -167,12 +167,59 @@ class AttentionKVCache : public ObjectRef {
 
 TVM_REGISTER_OBJECT_TYPE(AttentionKVCacheObj);
 
+/*!
+ * \brief Create multiple kv caches with same shape, from single memory 
allocation.
+ * \param init_data The initial data to put into the cache. Ignored if 
init_fill_count is
+ *        less than 0.
+ * \param reserve_shape The shape of cache.
+ * \param init_fill_count The initial row to fill into
+ *        the cache.
+ * \param num_caches Number of caches to create.
+ */
+Array<AttentionKVCache> CreateMultipleKVCaches(NDArray init_data, ShapeTuple 
reserve_shape,
+                                               int init_fill_count, int 
num_caches) {
+  DLDataType dtype = init_data->dtype;
+
+  int64_t cache_size = (dtype.bits * dtype.lanes + 7) / 8;
+  for (const auto dim : reserve_shape) {
+    cache_size *= dim;
+  }
+
+  // Add padding to make each cache align to kAllocAlignment
+  using tvm::runtime::kAllocAlignment;
+  int64_t padding = (kAllocAlignment - cache_size % kAllocAlignment) % 
kAllocAlignment;
+  int64_t cache_offset = cache_size + padding;
+
+  auto block = NDArray::Empty(ShapeTuple({cache_offset * num_caches}), dtype, 
init_data->device);
+  auto block_view = block.CreateView(reserve_shape, dtype);
+
+  Array<AttentionKVCache> result;
+  for (int i = 0; i < num_caches; ++i) {
+    // Use DLManagedTensor to prevent underlying memory from being freed
+    DLManagedTensor* data_view = block_view.ToDLPack();

Review Comment:
   Thanks! I updated the code to use storage interface and it looks cleaner. 
But now it could print a warning message if the requested allocator type 
mismatches from the allocator that is created at VM initialization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] yelite commented on a diff in pull request #15064: [Unity] Add an API to create multiple kv caches with single allocation

Reply via email to