[GitHub] [tvm] tqchen commented on a diff in pull request #15064: [Unity] Add an API to create multiple kv caches with single allocation

via GitHub Sat, 10 Jun 2023 07:54:48 -0700


tqchen commented on code in PR #15064:
URL: https://github.com/apache/tvm/pull/15064#discussion_r1225409068



##########
src/runtime/relax_vm/lm_support.cc:
##########
@@ -167,12 +168,56 @@ class AttentionKVCache : public ObjectRef {
 
 TVM_REGISTER_OBJECT_TYPE(AttentionKVCacheObj);
 
+/*!
+ * \brief Create multiple kv caches with same shape, from single memory 
allocation.
+ * \param init_data The initial data to put into the cache. Ignored if 
init_fill_count is
+ *        less than 0.
+ * \param reserve_shape The shape of cache.
+ * \param init_fill_count The initial row to fill into
+ *        the cache.
+ * \param num_caches Number of caches to create.
+ */
+Array<AttentionKVCache> CreateMultipleKVCaches(NDArray init_data, ShapeTuple 
reserve_shape,
+                                               int init_fill_count, int 
num_caches) {
+  DLDataType dtype = init_data->dtype;
+
+  int64_t cache_size = (dtype.bits * dtype.lanes + 7) / 8;

Review Comment:
   I think it is fine for now. Since subbyte are usually packed manually(the 
dtype is i32)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] tqchen commented on a diff in pull request #15064: [Unity] Add an API to create multiple kv caches with single allocation

Reply via email to